Spark Cluster Configuration

Description

Very simple Spark cluster configuration with Vagrant. I used standalone cluster configuration. There are two virtual marchines.

Spark master
- IP : 192.168.18.31
- Web UI: http://localhost:8080/
- It has a worker service too.
Spark slave
IP : 192.168.18.31
It has a worker service.

How to run

Run vagrant up
Connect master VM by vagrant ssh spark_master
Launch http://localhost:8080/
Run the command below.

 $> spark-submit --master spark://192.168.18.31:7077  /home/vagrant/spark/examples/src/main/python/pi.py

Watch Web UI what’s going on.

Test for Spark cluster

Run python code in cluster.

 $> spark-submit --master spark://192.168.18.31:7077 /master/rdd_test.py

Run python code with numpy or scipy in cluster.

Spark master copies libraries to cllient automatically. So Don’t need to worry to deploy python libraries into client node. Just install python library on Spark master only.

#Numpy example
 $> spark-submit --master spark://192.168.18.31:7077 /master/numpy_example.py

#Scipy example
 $> spark-submit --master spark://192.168.18.31:7077 /master/scipy_example.py"

Run python code with external execution file in cluster.

Spark can run external execution file in cluster. Then we sholud send external execution file by using addFile(). See the external_exe_example.py in project.

 #Compile C code. Numpy example
  $> gcc sum_of_number.c -o sum_of_number #It sums the numbers in argument.

 #Run script
  $> spark-submit --master spark://192.168.18.31:7077 /master/external_exe_example.py"

More Info

I just configured servers in public network for learning. ** NEVER CONFIGURE SERVER LIKE IT FOR REAL SERVICE.** .
I used pre-built Spark to save time. It can miss several package. So you should build Spark if you need full features of Spark.

Stop Thinking, Just Do!

Spark Cluster Configuration

Tags

21 October 2015

Spark Cluster Configuration

Description

How to run

Test for Spark cluster

Run python code in cluster.

Run python code with numpy or scipy in cluster.

Run python code with external execution file in cluster.

More Info