Article Source
- Title: Tachyon Overview
Tachyon Overview
Tachyon is a memory-centric distributed storage system enabling reliable data sharing at memory-speed across cluster frameworks, such as Spark and MapReduce. It achieves high performance by leveraging lineage information and using memory aggressively. Tachyon caches working set files in memory, thereby avoiding going to disk to load datasets that are frequently read. This enables different jobs/queries and frameworks to access cached files at memory speed.
Tachyon is Hadoop compatible. Existing Spark and MapReduce programs can run on top of it without any code change. The project is open source (Apache License 2.0) and is deployed at multiple companies. It has more than 80 contributors from over 30 institutions, including Yahoo, Intel, Red Hat, and Tachyon Nexus. The project is the storage layer of the Berkeley Data Analytics Stack (BDAS) and also part of the Fedora distribution.
Github Repository | Releases and Downloads | User Documentation | Developer Documentation | Meetup Group | JIRA | User Mailing List
Current Features
-
Java-like File API: Tachyon’s native API is similar to that of the
java.io.File
class, providing InputStream and OutputStream interfaces and efficient support for memory-mapped I/O. We recommend using this API to get the best performance from Tachyon. -
Compatibility: Tachyon implements the Hadoop FileSystem interface. Therefore, Hadoop MapReduce and Spark can run with Tachyon without modification. However, close integration is required to fully take advantage of Tachyon, and we are working towards that. End-to-end latency speedup depends on the workload and the framework, since various frameworks have different execution overhead.
-
Pluggable underlayer file system: To provide fault-tolerance, Tachyon checkpoints in-memory data to the underlayer file system. It has a generic interface to make plugging different underlayer file systems easy. We currently support HDFS, S3, GlusterFS, and single-node local file systems, and support for many other file systems is coming.
-
Native support for raw tables: Table data with over hundreds of columns is common in data warehouses. Tachyon provides native support for multi-columned data, with the option to put only hot columns in memory to save space.
-
Web UI: Users can browse the file system easily through the web UI. Under debug mode, administrators can view detailed information of each file, including locations, checkpoint path, etc.
-
Command line interaction: Users can use
./bin/tachyon tfs
to interact with Tachyon, e.g. copy data in and out of the file system.
User Documentation
Deployment Guide:
- Single Node
- Cluster
- Master Fault Tolerant Cluster
- Tachyon Deploy Module (Virtualbox and AWS EC2)
- Amazon AWS Through mesos/spark-ec2
Configuration:
- Configure Underlayer Storage System: Learn how to configure underlayer storage system or to create a new one.
- Configuration Settings: How to configure Tachyon.
Frameworks on Tachyon:
- Running Apache Spark on Tachyon: Get Apache Spark running on Tachyon
- Running Shark on Tachyon: Get Shark running on Tachyon
- Running Apache Hadoop MapReduce on Tachyon: Get Apache Hadoop MapReduce running on Tachyon
- Running Apache Flink on Tachyon: Get Apache Flink running on Tachyon
Others:
- Command-Line Interface: Interact with Tachyon through the command line.
- Syncing the Underlayer Storage System: Make Tachyon understand an existing underlayer storage system.
- FAQ
- Tachyon Java API (Javadoc)
- Tiered Storage (Beta)
Tachyon Presentations:
- Strata and Hadoop World 2014 (October, 2014) pdf pptx
- Spark Summit 2014 (July, 2014) pdf
- Strata and Hadoop World 2013 (October, 2013) pdf
Developer Documentation
Building Tachyon Master Branch
External resources
Tachyon Mini Courses:
Hot Rod Hadoop With Tachyon on Fedora 21
Support or Contact
You are welcome to join our mailing list to discuss questions and make suggestions. We use JIRA to track development and issues. If you are interested in trying out Tachyon in your cluster, please contact Haoyuan.
Acknowledgement
Tachyon is an open source project started in the UC Berkeley AMP Lab. This research is supported in part by NSF CISE Expeditions Award CCF-1139158, LBNL Award 7076018, and DARPA XData Award FA8750-12-2-0331, and gifts from Amazon Web Services, Google, SAP, The Thomas and Stacey Siebel Foundation, Adatao, Adobe, Apple, Inc., Blue Goji, Bosch, C3Energy, Cisco, Cray, Cloudera, EMC, Ericsson, Facebook, Guavus, Huawei, Informatica, Intel, Microsoft, NetApp, Pivotal, Samsung, Splunk, Virdata, VMware, and Yahoo!.
We would also like to thank to our project contributors.
Related Projects
Berkeley Data Analysis Stack (BDAS) from AMPLab at Berkeley