Introduction to MLflow: managing the ML lifecycle from experimentation to deployment

Developing adequate machine learning models involves a lot of experimentation where different models or tools are tested, modified, discarded or eventually accepted. During this process, however, any history or traceable development that lead to the latest model may get lost since models usually are not being versioned continuously and their performance is not being stored. Results may then no longer be reproducible.

In this workshop, we will introduce MLflow and show how you can integrate this model management framework into the ML lifecycle, from tracking experiments to eventually deploying models.

Prerequisites

To run code and set up environments: Python 3 (We encountered problems with Python 3.8 and recommend using Python 3.6 or 3.7 instead.)
To run MLflow Projects and MLflow Models:
- Get conda >= 4.6
- Additionally, configure CLI to activate conda environments. To do so, type conda init <CLI> in your CLI, where <CLI> is cmd.exe for Windows users or bash for Linux users.

Setup

For this workshop, we will need some virtual environments:

one for setting up the server
one for running sklearn examples

The commands to create these environments are given as follows.

Virtual environment for MLflow server

Windows

python -m venv mlflow_server
mlflow_server\Scripts\activate
pip install mlflow==1.5.0
deactivate

Linux, Mac

python -m venv mlflow_server
. mlflow_server/bin/activate
pip install mlflow==1.5.0
deactivate

Alternatively, you can create conda environments like so:

conda create -n mlflow_server python=3.7
conda activate mlflow_server
pip install mlflow==1.5.0
conda deactivate

Virtual environment with required packages for workshop

Windows

python -m venv mlflow_sklearn
mlflow_sklearn\Scripts\activate
pip install mlflow==1.5.0 sklearn matplotlib jupyter
python -m ipykernel install --user --name mlflow_sklearn --display-name "Python (MLflow sklearn)"
deactivate

Linux, Mac

python -m venv mlflow_sklearn
. mlflow_sklearn/bin/activate
pip install mlflow==1.5.0 sklearn matplotlib jupyter
python -m ipykernel install --user --name mlflow_sklearn --display-name "Python (MLflow sklearn)"
deactivate

Alternatively, with conda:

conda create -n mlflow_sklearn python=3.7
conda activate mlflow_sklearn
pip install mlflow==1.5.0 sklearn matplotlib jupyter
python -m ipykernel install --user --name mlflow_sklearn --display-name "Python (MLflow sklearn)"
conda deactivate

Disclaimer

At the time of writing, the latest version was v1.5.0, which was used to prepare and conduct this workshop.

Most of the documentation in this Git Repository is directly taken from MLlfow's website. It is condensed to explain the functionalities that I think are most important to be discussed in an introductory workshop. For more information and a deeper insight, please directly consult MLflow's documentation.

Overview on MLflow

MLflow is an open source platform for managing the end-to-end machine learning lifecycle. It tackles three primary functions:

Tracking experiments to record and compare parameters and results
Packaging ML code in a reusable, reproducible form in order to share with other data scientists or transfer to production
Managing and deploying models from a variety of ML libraries to a variety of model serving and inference platforms

MLflow is library-agnostic. It can be used with any machine learning library, and in any programming language, since all functions are accessible through a REST API and CLI. For convenience, the project also includes a Python API, R API, and Java API. Here, we only concentrate on the Python API.

In this workshop, we will discuss MLflow's three components: MLflow Tracking, MLflow Projects, and MLflow Models.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
00_slides		00_slides
10_tracking		10_tracking
20_projects		20_projects
30_models		30_models
40_data_and_tasks		40_data_and_tasks
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction to MLflow: managing the ML lifecycle from experimentation to deployment

Prerequisites

Setup

Virtual environment for MLflow server

Virtual environment with required packages for workshop

Disclaimer

Overview on MLflow

About

Releases

Packages

Languages

amld/AMLD_2020_MLflow

Folders and files

Latest commit

History

Repository files navigation

Introduction to MLflow: managing the ML lifecycle from experimentation to deployment

Prerequisites

Setup

Virtual environment for MLflow server

Virtual environment with required packages for workshop

Disclaimer

Overview on MLflow

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages