- This is an example of an interactive Data Application that performs Exploratory Data Analysis + Charting.
- The application also serves predictions from pre-trained ML models to a user in realtime.
- The primary datasource originally comes from this Kaggle competition. It contains price data from homes sold in King County, WA from May 2014 - May 2015.
- There is also a hand-curated dataset of Zipcodes that is used to determine the city each house is in.
├── README.md
├── backend
│ ├── Dockerfile - Dockerfile for the `backend` service
│ ├── api.py - FastAPI backend definition
│ ├── data
│ │ ├── processed
│ │ │ ├── kc_housing_data_processed.csv - Processed, clean data used to power models
│ │ │ └── zipcode_city_mapping.xlsx - Curated dataset assigning each Zipcode to a city
│ │ └── raw
│ │ └── kc_house_data.csv - Raw, immutable data from Kaggle
│ ├── images - Pre-saved images and 3D plots
│ │ ├── 3dplot.pickle
│ │ ├── prcurve.png
│ │ └── roccurve.png
│ ├── modeling.py - Python script for training regression modelss
│ ├── models - Directory of serialized ML models
│ │ ├── bed_bath_regressor.pkl
│ │ ├── full_regressor.pkl
│ │ └── logreg.pkl
│ ├── request.py - Examples of how to query the API with Python
│ └── requirements.txt - Python dependencies for the backend service
├── docker-compose.yml - Docker Compose definition for both services
└── frontend
├── Dockerfile - Dockerfile for the `frontend` service
├── landing.py - Landing page for the Streamlit frontend
├── pages - Streamlit sub-pages
│ ├── maps.py
│ └── modeling.py
├── requirements.txt - Python dependencies for the `frontend` service
└── utils.py - Plotting utilities for the frontend
The application defines two services, backend
and frontend
.
The application can be easily built through Docker Compose or manually with multiple terminals.
Backend:
- We use FastAPI as our backend to serve prediction results in real time. Under the hood, FastAPI uses the Uvicorn webserver.
- To validate the structure of our API requests, we use Pydantic models.
- The best performing models are trained using sklearn, serialized, and then saved to disk.
- For visualization, we use a mix of vanilla Matplotlib, Seaborn, and PyDeck for mapping.
Frontend:
- We use Streamlit, a powerful framework that lets us build interactive web applications entirely in Python!
- Within the frontend, we have multiple pages for performing Exploratory Data Analysis, Mapping, and obtaining the results of our ML Models.
git clone https://github.com/akan72/mlpowered.git
cd mlpowered
Install Docker and Docker Compose. After installing, make sure that the Docker Daemon is running by starting the Docker Desktop app.
docker-compose up
Install the required Python dependencies
pip install -r backend/requirements.txt -r frontend/requirements.txt
Start the Uvicorn webserver + FastAPI backend in one terminal
uvicorn backend.api:app
Start the Streamlit frontend in another terminal
streamlit run frontend/landing.py
Navigate to http://localhost:8501 to view the live app!
Send POST requests to the API to get live predictions!
import requests
url = 'http://localhost:8000/predict/linreg/'
json = {
'bedrooms': 3,
'bathrooms': 2,
'yr_built': 2000,
}
predicted_price = requests.post(url, json=json)
print(predicted_price.status_code)
print(predicted_price.json()['price'])
print(predicted_price.json()['model'])
Visit http://localhost:8000/docs to view the auto generated Swagger docs created by FastAPI! From this view, we can also hit the API endpoints directly without wriing any code.
To retrain the models, run:
cd backend
python modeling.py
Additional code examples for submitting requests may also be found in backend/request.py