Skip to content

Files

Latest commit

e9a652f · Feb 9, 2015

History

History
This branch is 153 commits behind sryza/aas:master.

simplesparkproject

Simple Spark Maven Project

A demonstration of a simple Spark project that uses Maven for building. The app simply counts the number of lines in a text file.

To build a jar:

mvn package

To build a jar against CDH:

mvn package -Pcdh

To run on a cluster with Spark installed:

spark-submit --class com.cloudera.datascience.MyApp --master local \
  target/simplesparkproject-0.0.1-SNAPSHOT.jar <input file>

To run a REPL that can reference the objects and classes defined in this project:

spark-shell --jars target/simplesparkproject-0.0.1-SNAPSHOT.jar --master local

The --master local argument means that the application will run in a single local process. If the cluster is running a Spark standalone cluster manager, you can replace it with --master spark://<master host>:<master port>. If the cluster is running YARN, you can replace it with --master yarn.

To pass configuration options on the command line, use the --conf option, e.g. --conf spark.serializer=org.apache.spark.serializer.KryoSerializer.