Machine Learning Pipeline for Wildfire Detection.
- Poetry: Python packaging and dependency
management - Install it with something like
pipx
- Git LFS: Git Large File Storage replaces large files such as jupyter notebooks with text pointers inside Git while storing the file contents on a remote server like github.com
- DVC: Data Version Control - This will get installed automatically
- MLFlow: ML Experiment Tracking - This will get installed automatically
Follow the official documentation to install poetry
.
Make sure git-lfs
is installed on your system.
Run the following command to check:
git lfs install
If not installed, one can install it with the following:
sudo apt install git-lfs
git-lfs install
brew install git-lfs
git-lfs install
Download and run the latest windows installer.
Create a virtualenv and install python version with conda - or use a combination of pyenv and venv:
conda create -n pyronear-mlops python=3.12
Activate the virtual environment:
conda activate pyronear-mlops
Install python dependencies
poetry install
To get the data dependencies one can use DVC - To fully use this repository you would need access to our DVC remote storage which is currently reserved for Pyronear members. On request, you will be provided with AWS credentials to access our remote storage.
Once setup, run the following command:
dvc pull
Create the following file ~/.aws/config
:
[profile pyronear]
region = eu-west-3
Add your credentials in the file ~/.aws/credentials
- replace XXX
with your access key id and your secret access key:
[pyronear]
aws_access_key_id = XXX
aws_secret_access_key = XXX
Make sure you use the AWS pyronear
profile:
export AWS_PROFILE=pyronear
The project is organized following mostly the cookie-cutter-datascience guideline.
All the data lives in the data
folder and follows some data engineering
conventions.
The library code is available under the pyronear_mlops
folder.
The notebooks live in the notebooks
folder. They are automatically synced to
the Git LFS storage.
Please follow this
convention
to name your Notebooks.
<step>-<ghuser>-<description>.ipynb
- e.g., 0.3-mateo-visualize-distributions.ipynb
.
The scripts live in the scripts
folder, they are
commonly CLI interfaces to the library
code.
DVC is used to track and define data pipelines and make them
reproducible. See dvc.yaml
.
To get an overview of the pipeline DAG:
dvc dag
To run the full pipeline:
dvc repro
An MLFlow server is running when running ML experiments to track hyperparameters and performances and to streamline model selection.
To start the mlflow UI server, run the following command:
make mlflow_start
To stop the mlflow UI server, run the following command:
make mlflow_stop
To browse the different runs, open your browser and navigate to the URL: http://localhost:5000
Follow the steps:
- Work on a separate git branch:
git checkout -b "<user>/<experiment-name>"
- Modify and iterate on the code, then run
dvc repro
. It will rerun parts of the pipeline that have been updated. - Commit your changes and open a Pull Request to get your changes approved and merged.
Use the following commands to run random hyperparameter search:
make run_yolov8_hyperparameter_search
It will run 100 random training runs with hyperparameters drawn from the
hyperparameter space defined in
pyronear_mlops/model/yolo/hyperparameters/yolov8.py
Use the following commands to run random hyperparameter search:
make run_yolov9_hyperparameter_search
It will run 100 random training runs with hyperparameters drawn from the
hyperparameter space defined in
pyronear_mlops/model/yolo/hyperparameters/yolov9.py
Use the following commands to run random hyperparameter search:
make run_yolov10_hyperparameter_search
It will run 100 random training runs with hyperparameters drawn from the
hyperparameter space defined in
pyronear_mlops/model/yolo/hyperparameters/yolov10.py
make yolov8_benchmark
make yolov9_benchmark
make yolov10_benchmark