Project for ML DevOps Engineer Nanodegree, unit 5.
A company that has 10,000 corporate clients company needs to create, deploy, and monitor a risk assessment ML model that will estimate the attrition risk of each of the company's clients. If the model is accurate, it will enable the client managers to contact the clients with the highest risk and avoid losing clients and revenue.
Creating and deploying the model isn't the end of the work, though. The industry is dynamic and constantly changing, and a model that was created a year or a month ago might not still be accurate today. Because of this, we need to set up regular monitoring of the model to ensure that it remains accurate and up-to-date. Scripts to re-train, re-deploy, monitor, and report on the ML model will be created. In this way, the company can get risk assessments that are as accurate as possible and minimize client attrition.
- Python 3 required
- Linux environment may be needed within windows through WSL
This project dependencies is available in the requirements.txt
file.
Use the package manager pip to install the dependencies from the requirements.txt
.
Its recommended to install it in a separate virtual environment.
pip install -r requirements.txt
or through pipenv:
sudo apt install pipenv
pipenv shell
pipenv install
- Data ingestion: Automatically check if new data that can be used for model training. Compile all training data to a training dataset and save it to folder.
- Training, scoring, and deploying: Write scripts that train an ML model that predicts attrition risk, and score the model. Saves the model and the scoring metrics.
- Diagnostics: Determine and save summary statistics related to a dataset. Time the performance of some functions. Check for dependency changes and package updates.
- Reporting: Automatically generate plots and PDF document that report on model metrics and diagnostics. Provide an API endpoint that can return model predictions and metrics.
- Process Automation: Create a script and cron job that automatically run all previous steps at regular intervals.
"input_folder_path": "practicedata",
"output_folder_path": "ingesteddata",
"test_data_path": "testdata",
"output_model_path": "practicemodels",
"prod_deployment_path": "production_deployment"
python ingestion.py
Artifacts output:
ingesteddata/finaldata.csv
ingesteddata/ingestedfiles.txt
python training.py
Artifacts output:
practicemodels/trainedmodel.pkl
practicemodels/encoder.pkl
python scoring.py
Artifacts output:
practicemodels/latestscore.txt
python deployment.py
Artifacts output:
production_deployment/ingestedfiles.txt
production_deployment/trainedmodel.pkl
production_deployment/latestscore.txt
python diagnostics.py
python reporting.py
Artifacts output:
practicemodels/confusionmatrix.png
python app.py
python apicalls.py
Artifacts output:
practicemodels/apireturns.txt
"input_folder_path": "sourcedata",
"output_folder_path": "ingesteddata",
"test_data_path": "testdata",
"output_model_path": "models",
"prod_deployment_path": "production_deployment"
Train production model:
python training.py
python fullprocess.py
Start cron service
sudo service cron start
Edit crontab file
sudo crontab -e
- Select option 3 to edit file using vim text editor
- Press i to insert a cron job
- Write the cron job in
cronjob.txt
which runsfullprocces.py
every 10 mins - Save after editing, press esc key, then type :wq and press enter
View crontab file
sudo crontab -l