Stop Thinking, Just Do!

Sung-Soo Kim's Blog

Apache Spark Documentation


7 July 2014

Article Source

Spark Documentation

Setup instructions, programming guides, and other documentation are available for each version of Spark below:

The documentation linked to above covers getting started with Spark, as well the built-in components MLlib, Spark Streaming, and GraphX.

In addition, this page lists other resources for learning Spark.


See the Apache Spark YouTube Channel for videos from Spark events. There are separate playlists for videos of different topics. Besides browsing through playlists, you can also find direct links to videos below.

Screencast Tutorial Videos

Spark Summit Videos

Meetup Talk Videos

In addition to the videos listed below, you can also view all slides from Bay Area meetups here.

Hands-On Exercises

  • Hands-on exercises are available online from Spark Summit 2013. These exercises let you launch a small EC2 cluster, load a dataset, and query it with Spark, Shark, Spark Streaming, and MLlib.

Training Materials

  • The 2nd day of Spark Summit 2013 was a training session, and you can find the slides and videos from that inline in the training day agenda. The session also included exercises that you can walk through yourself, which will guide you through launching a Spark cluster on EC2 and using various Spark components to analyze real data.
  • The UC Berkeley AMPLab regularly hosts training camps on Spark and related projects. Slides, videos and EC2-based exercises from each of these are available online:
    • AMP Camp 4 (Strata Santa Clara, Feb 2014) — focus on BlinkDB, MLlib, GraphX, Tachyon
    • AMP Camp 3 (Berkeley, CA, Aug 2013)
    • AMP Camp 2 (Strata Santa Clara, Feb 2013)
    • AMP Camp 1 (Berkeley, CA, Aug 2012)

External Tutorials, Blog Posts, and Talks




  • The Spark wiki contains information for developers, such as architecture documents and how to contribute to Spark.

Research Papers

Spark was initially developed as a UC Berkeley research project, and much of the design is documented in papers. The research page lists some of the original motivation and direction. The following papers have been published about Spark and related projects.

Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation.

comments powered by Disqus