Skip to content

This project offers a complete MLOps workflow for predicting rain in Australia. It uses Python, Flask, and Scikit-learn for model development and provides a web interface for predictions. The repository includes CI/CD pipelines and containerization with Docker.

License

Notifications You must be signed in to change notification settings

FaheemKhan0817/End-to-End-MLOps-for-Australia-Rain-Prediction

Repository files navigation

#End-to-End MLOps for Australia Rain Prediction 🌦️

Welcome to the End-to-End MLOps for Australia Rain Prediction project! This project showcases a complete machine learning operations (MLOps) pipeline for predicting rainfall in Australia using the WeatherAUS dataset. From data ingestion to model deployment, it demonstrates best practices in MLOps, integrating modern tools and automation for a seamless workflow.

📋 Table of Contents

Project Overview Features Tech Stack Project Structure Installation Usage Training the Model CI/CD Pipeline with GitLab Deployment on Render Contributing License Acknowledgements

🌟 Project Overview This project leverages the WeatherAUS dataset to predict whether it will rain tomorrow across various Australian locations. The pipeline encompasses:

Data Ingestion: Automatically fetches and processes the dataset. Preprocessing: Handles missing values, encodes categorical variables, and applies SMOTE for imbalanced data. Model Training: Trains an XGBoost model with hyperparameter tuning and threshold optimization. Web Application: Deploys a Flask-based web app for real-time rain predictions. Automation: Uses GitLab CI/CD for continuous integration and deployment. Hosting: Deploys the app on Render’s free tier using Docker.

The result is a scalable, user-friendly application for weather predictions, backed by a robust MLOps pipeline.

✨ Features

Automated Data Ingestion: Downloads and prepares the WeatherAUS dataset. Advanced Preprocessing: Handles missing data, feature encoding, and class imbalance with SMOTE. Optimized Model: Trains an XGBoost model with fine-tuned hyperparameters and an optimized decision threshold. Interactive Web Interface: A Flask app with a form for users to input weather features and receive predictions. CI/CD Automation: Automates building, testing, and deployment using GitLab CI/CD. Cloud Deployment: Hosts the app on Render with a free-tier plan for scalability.

🛠️ Tech Stack

Category Tools

Programming Language Python 3.9

Machine Learning XGBoost, Scikit-learn, Imbalanced-learn

Web Framework Flask

Containerization Docker

CI/CD GitLab CI/CD

Hosting Render


📂 Project Structure
End-to-End-MLOps-for-Australia-Rain-Prediction/
├── app.py                      # Flask app for serving predictions
├── src/
│   ├── data_ingestion.py       # Downloads the WeatherAUS dataset
│   ├── data_processing.py      # Preprocessing and feature engineering
│   ├── model_training.py       # Trains the XGBoost model
│   ├── training_pipeline.py    # Orchestrates the training pipeline
│   ├── custom_exception.py     # Custom exception handling
│   ├── logger.py               # Logging utility
│   └── __init__.py
├── config/
│   ├── paths_config.py         # Path configurations
│   └── __init__.py
├── static/
│   ├── css/
│   │   └── styles.css          # CSS for the web interface
│   └── js/
│       └── script.js           # JavaScript for the web interface
├── templates/
│   └── index.html              # HTML template for the Flask app
├── artifacts/
│   ├── model/
│   │   ├── xgboost_rain_prediction_model.pkl  # Trained model
│   │   └── threshold.txt       # Optimized threshold
│   ├── raw/                    # Raw dataset (not in GitLab)
│   └── processed/              # Processed data (not in GitLab)
├── Dockerfile                  # Docker configuration
├── .gitlab-ci.yml              # GitLab CI/CD pipeline configuration
├── requirements.txt            # Python dependencies
└── README.md                   # Project documentation

🚀 Installation Prerequisites

Python 3.9+ Docker GitLab repository access Render account (free tier)

Steps

Clone the Repository: git clone https://gitlab.com/faheemkhan08171/end-to-end-mlops-for-australia-rain-prediction.git cd end-to-end-mlops-for-australia-rain-prediction

Install Dependencies: pip install -r requirements.txt

Set Up Environment:

Ensure artifacts/model/ contains xgboost_rain_prediction_model.pkl and threshold.txt (included in the repository).

🖥️ Usage Running Locally

Start the Flask App: python app.py

Access the Web Interface:

Open http://localhost:5000 in your browser. Fill in the weather features (e.g., RainToday, Humidity3pm). Submit to get a prediction (e.g., "Yes, it will rain tomorrow! Probability: 65.43%").

Running with Docker

Build the Docker Image: docker build -t mlops-app .

Run the Container: docker run -p 5000:5000 mlops-app

Access the App:

Visit http://localhost:5000 in your browser.

🧠 Training the Model To retrain the model with updated data:

Run the Training Pipeline: python src/training_pipeline.py

This downloads the WeatherAUS dataset, preprocesses it, applies SMOTE, trains the XGBoost model, and saves artifacts to artifacts/model/.

Verify Artifacts:

Check that xgboost_rain_prediction_model.pkl and threshold.txt are updated in artifacts/model/.

Note: Processed data (X_train.pkl, etc.) is not stored in the repository due to its size (~130 MB) but is generated during training.

🔄 CI/CD Pipeline with GitLab The project uses GitLab CI/CD to automate building, pushing, and deploying the Dockerized app to Render. Workflow Overview

Trigger: Runs on pushes to the main branch. Steps: Builds a Docker image using the Dockerfile. Pushes the image to a container registry (e.g., Docker Hub). Deploys the image to Render using Render’s API.

Setup

Container Registry:

Create a Docker Hub account. Add the following variables in GitLab CI/CD settings (Project → Settings → CI/CD → Variables): DOCKERHUB_USERNAME: Your Docker Hub username. DOCKERHUB_PASSWORD: Your Docker Hub password or access token.

Render API Key:

Sign up for a Render account and generate an API key (Render Dashboard → Account Settings → API Keys). Add the following variables in GitLab CI/CD settings: RENDER_API_KEY: Your Render API key. RENDER_SERVICE_ID: Your Render service ID (found in the Render Dashboard).

Pipeline Configuration:

The pipeline is defined in .gitlab-ci.yml. It includes stages for checkout, build, and deployment to Kubernetes (or Render in this case).

☁️ Deployment on Render The app is hosted on Render’s free tier for scalability and ease of use. Deployment Steps

Create a Render Account:

Sign up at Render.

Set Up a Web Service:

In the Render Dashboard, create a new Web Service. Select Docker as the runtime. Link your Docker Hub repository (e.g., yourusername/mlops-app). Configure: Port: 5000 Environment Variables: FLASK_APP: app.py FLASK_ENV: production

Use the free tier plan.

Deploy Automatically via CI/CD:

The GitLab CI/CD pipeline deploys the app to Render after pushing the Docker image to Docker Hub. Alternatively, trigger a manual deploy in the Render Dashboard.

Access the App:

https://end-to-end-mlops-for-australia-rain.onrender.com/ Use the form on the app to make predictions.

🤝 Contributing Contributions are welcome! To contribute:

Fork the repository on GitLab. Create a new branch (git checkout -b feature/your-feature). Make changes and commit (git commit -m "Add your feature"). Push to your branch (git push origin feature/your-feature). Open a Merge Request on GitLab.

Please ensure your code follows the project’s style and includes tests if applicable.

📜 License This project is licensed under the MIT License. See the LICENSE file for details.

🙏 Acknowledgements

WeatherAUS Dataset: For providing the data used in this project. Open-Source Community: For tools like Flask, XGBoost, and Docker. Render: For offering a free tier to host the application.

Happy Predicting! 🌦️

About

This project offers a complete MLOps workflow for predicting rain in Australia. It uses Python, Flask, and Scikit-learn for model development and provides a web interface for predictions. The repository includes CI/CD pipelines and containerization with Docker.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published