Article Source

Title: Apache Tajo™ 0.8.0 Documentation

Introduction to Partitioning

Table partitioning provides two benefits: easy table management and data pruning by partition keys. Currently, Apache Tajo only provides Apache Hive-compatible column partitioning.

Partitioning Methods

Tajo provides the following partitioning methods:

Column Partitioning
Range Partitioning (TODO)
Hash Partitioning (TODO)

Column Partitioning

The column table partition is designed to support the partition of Apache Hive™.

How to Create a Column Partitioned Table You can create a partitioned table by using the PARTITION BY clause. For a column partitioned table, you should use the PARTITION BY COLUMN clause with partition keys.

For example, assume there is a table orders composed of the following schema.

id          INT,
item_name   TEXT,
price       FLOAT

Also, assume that you want to use order_date TEXT and ship_date TEXT as the partition keys. Then, you should create a table as follows:

CREATE TABLE orders (
  id INT,
  item_name TEXT,
  price
) PARTITION BY COLUMN (order_date TEXT, ship_date TEXT);

Partition Pruning on Column Partitioned Tables

The following predicates in the WHERE clause can be used to prune unqualified column partitions without processing during query planning phase.

=
<>
<
=
<=
LIKE predicates with a leading wild-card character
IN list predicates

Compatibility Issues with Apache Hive™

If partitioned tables of Hive are created as external tables in Tajo, Tajo can process the Hive partitioned tables directly. There haven’t known compatibility issues yet.

Stop Thinking, Just Do!

Apache Tajo Table Partitioning

Tags