This project is greatly inspired by airstream of Airbnb and S2Job.
You can define a Spark job without writing even one line of Java/Scala codes. What you only need to do is just to write a job description file(json format).
Example
hive_source_example.json
{
"name": "sampleJob",
"source": [
{
"name": "hiveSource",
"inputs": [],
"type": "hive",
"options": {
"sql": "select * from hive_user"
}
}
],
"process": [
{
"name": "transform",
"inputs": ["hiveSource"],
"type": "sql",
"options": {
"sql": "select * from hiveSource WHERE id = 1"
}
}
],
"sink": [
{
"name": "hdfs_sink",
"inputs": ["transform"],
"type": "hdfs",
"options": {
"path": "/data/test",
"format": "parquet"
}
}
]
}
spark-submit --class com.easternallstars.star.spark.job.JobLauncher --master yarn --deploy-mode client star-spark.jar hive_source_example.json
Have a cup of coffee and wait util the job is over.
You can also upload job description file and jar to hdfs, and let Oozie schedule the spark job.