Stop Thinking, Just Do!

Sungsoo Kim's Blog

Apache Tajo Table Partitioning

tagsTags

12 May 2014


Article Source

Introduction to Partitioning

Table partitioning provides two benefits: easy table management and data pruning by partition keys. Currently, Apache Tajo only provides Apache Hive-compatible column partitioning.

Partitioning Methods

Tajo provides the following partitioning methods:

  • Column Partitioning
  • Range Partitioning (TODO)
  • Hash Partitioning (TODO)

Column Partitioning

The column table partition is designed to support the partition of Apache Hive™.

How to Create a Column Partitioned Table You can create a partitioned table by using the PARTITION BY clause. For a column partitioned table, you should use the PARTITION BY COLUMN clause with partition keys.

For example, assume there is a table orders composed of the following schema.

id          INT,
item_name   TEXT,
price       FLOAT

Also, assume that you want to use order_date TEXT and ship_date TEXT as the partition keys. Then, you should create a table as follows:

CREATE TABLE orders (
  id INT,
  item_name TEXT,
  price
) PARTITION BY COLUMN (order_date TEXT, ship_date TEXT);

Partition Pruning on Column Partitioned Tables

The following predicates in the WHERE clause can be used to prune unqualified column partitions without processing during query planning phase.

  • =
  • <>
  • <
  • =

  • <=
  • LIKE predicates with a leading wild-card character
  • IN list predicates

Compatibility Issues with Apache Hive™

If partitioned tables of Hive are created as external tables in Tajo, Tajo can process the Hive partitioned tables directly. There haven’t known compatibility issues yet.


comments powered by Disqus