Article Source
Tajo JDBC Driver
Apache Tajo™ provides JDBC driver which enables Java applciations to easily access Apache Tajo in a RDBMS-like manner. In this section, we explain how to get JDBC driver and an example code.
How to get JDBC driver
Tajo provides some necesssary jar files packaged by maven. In order get the jar files, please follow the below commands.
$ cd tajo-x.y.z-incubating
$ mvn clean package -DskipTests -Pdist -Dtar
$ ls -l tajo-dist/target/tajo-x.y.z-incubating/share/jdbc-dist
Setting the CLASSPATH
In order to use the JDBC driver, you should set the jar files included in tajo-dist/target/tajo-x.y.z-incubating/share/jdbc-dist to your CLASSPATH. In addition, you should add hadoop clsspath into your CLASSPATH. So, CLASSPATH will be set as follows:
CLASSPATH=path/to/tajo-jdbc/*:${TAJO_HOME}/conf:$(hadoop classpath)
Note
You can get ${hadoop classpath} by executing the command bin/hadoop
classpath in your hadoop cluster.
Note
You may want to a minimal set of JAR files. If so, please refer Minimal JAR file list.
An Example JDBC Client
The JDBC driver class name is org.apache.tajo.jdbc.TajoDriver. You can get the driver Class.forName(“org.apache.tajo.jdbc.TajoDriver”).newInstance(). The connection url should be jdbc:tajo://
Note
Currently, Tajo does not support the concept of database and namespace. All tables are contained in default
database. So, you don’t need to specify any database name.
The following shows an example of JDBC Client.
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;
public class TajoJDBCClient {
....
public static void main(String[] args) throws Exception {
Class.forName("org.apache.tajo.jdbc.TajoDriver").newInstance();
Connection conn = DriverManager.getConnection("jdbc:tajo://127.0.0.1:26002");
Statement stmt = null;
ResultSet rs = null;
try {
stmt = conn.createStatement();
rs = stmt.executeQuery("select * from table1");
while (rs.next()) {
System.out.println(rs.getString(1) + "," + rs.getString(3));
}
} finally {
if (rs != null) rs.close();
if (stmt != null) stmt.close();
if (conn != null) conn.close();
}
}
}
Appendix
Minimal JAR file list
The following JAR files are necessary minimal JAR file list. We’ve tested JDBC drivers with the following JAR files for usual SQL queries. But, they does not guarantee that they are fully tested for all operations. So, you may need additional JAR files. In addition to the following JAR files, please don’t forgot including ${HADOOP_HOME}/eta/hadoop
and ${TAJO_HOME}/conf
in your CLASSPATH
.
- hadoop-annotations-2.2.0.jar
- hadoop-auth-2.2.0.jar
- hadoop-common-2.2.0.jar
- hadoop-hdfs-2.2.0.jar
- joda-time-2.3.jar
- tajo-catalog-common-0.8.0-SNAPSHOT.jar
- tajo-client-0.8.0-SNAPSHOT.jar
- tajo-common-0.8.0-SNAPSHOT.jar
- tajo-jdbc-0.8.0-SNAPSHOT.jar
- tajo-rpc-0.8.0-SNAPSHOT.jar
- tajo-storage-0.8.0-SNAPSHOT.jar
- log4j-1.2.17.jar
- commons-logging-1.1.1.jar
- guava-11.0.2.jar
- protobuf-java-2.5.0.jar
- netty-3.6.6.Final.jar
- commons-lang-2.5.jar
- commons-configuration-1.6.jar
- slf4j-api-1.7.5.jar
- slf4j-log4j12-1.7.5.jar
- commons-cli-1.2.jar
- commons-io-2.1.jar
FAQ
java.nio.channels.UnresolvedAddressException
When retriving the final result, Tajo JDBC Driver tries to access HDFS data nodes. So, the network access between JDBC client and HDFS data nodes must be available. In many cases, a HDFS cluster is built in a private network which use private hostnames. So, the host names must be shared with the JDBC client side.