Disaster Management Pipeline

Introduction

ETL

An ETL pipeline is a specific kind of data pipeline and very common. ETL stands for Extract, Transform, Load. Imagine that you have a database containing web log data. Each entry contains the IP address of a user, a timestamp, and the link that the user clicked.

Before cloud computing, businesses stored their data on large, expensive, private servers. Running queries on large data sets, like raw web log data, could be expensive both economically and in terms of time. But data analysts might need to query a database multiple times even in the same day; hence, pre-aggregating the data with an ETL pipeline makes sense.

ELT

ELT (extract, load, transform) pipelines have gained traction since the advent of cloud computing. Cloud computing has lowered the cost of storing data and running queries on large, raw data sets. Many of these cloud services, like Amazon Redshift, Google BigQuery, or IBM Db2 can be queried using SQL or a SQL-like language. With these tools, the data gets extracted, then loaded directly, and finally transformed at the end of the pipeline.

However, ETL pipelines are still used even with these cloud tools. Oftentimes, it still makes sense to run ETL pipelines and store data in a more readable or intuitive format. This can help data analysts and scientists work more efficiently as well as help an organization become more data driven.

An ETL pipeline was created, extracting data from csv files, cleaning and loading into an SQL database. A machine learning pipeline was created to extract the NLP features and then optimize the algorithm using grid search. A web app was then developed that extracts the initial data from the database and provides some interactive visual summaries. Users are also able to enter their own message to be classified by the algorithm.

Packages Used

pandas
sklearn
sqlite3
sqlalchemy
nltk
plotly
flask

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
data		data
ETL_Pipeline.ipynb		ETL_Pipeline.ipynb
ML_Pipeline.ipynb		ML_Pipeline.ipynb
README.md		README.md
classifier.py		classifier.py
process_data.py		process_data.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disaster Management Pipeline

Introduction

ETL

ELT

Packages Used

About

Releases

Packages

Languages

Shuniy/Disaster-Management-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Disaster Management Pipeline

Introduction

ETL

ELT

Packages Used

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages