Skip to content

OnlySekai/Hadoop-Spark-Airflow

Repository files navigation

Project 3

Run pipeline

  1. Install dependency

    pip3 install -r requirement
  2. Run cluster

    docker-compose up -d
  3. Install airflow

    export AIRFLOW_HOME=/working/dir/airflow/dags
    ./install-airflow.sh
  4. Run pipeline

    airflow standalone

Crawl data

  1. change NAME_NODE_ID in /crawler/constants.py by name node container id.

  2. Run bash

    cd crawler
    python3 crawler.py

Run spark application

  1. change containerId in /client/run.sh by spark container id.

  2. change fileName in /client/run.sh by application want run.

  3. Run bash

    /client/run.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published