Skip to content

Latest commit

 

History

History
9 lines (9 loc) · 604 Bytes

README.md

File metadata and controls

9 lines (9 loc) · 604 Bytes

Spark-Training

This repository stores my training course work on Databricks platform. It includes two parts:

  • DataFrame Lab:
    • Practiced read, write data using DataFrameReader and DataFrameWriter
    • Using DataFrame API, perform transformation and action to analyze data.
  • Structured Streaming:
    • Practiced read, write streams from file and messaging system Kafka using DataStreamReader and DataStreamWriter.
    • Using DataFrame API to perform ETL jobs.
    • Built a Twitter realtime data pipeline to get information like top most tweeted hashtag in last 5 minute and where they came from.