Skip to content

Commit

Permalink
Experiment 2 (#9)
Browse files Browse the repository at this point in the history
* Updated paths

* Implemented conversion of Burned Area from m2 to hectares before the fuel load is calculated in fuelload.py. This addresses issue #10.

* Added new notebook with more concise pre-processing. It includes the conversion to dataframe (model input). It solves issues #10 and #12. It also defines a new threshold for BA (50 hectares, see #11, the new threshold is defined by FDG - fire expert) but a reliable reference source is still not available.

* Notebook notebooks/preprocess_all_in_one.ipynb reformatted using black

* Formatted src/utils/fuelload.py using black

* Minor changes to notebooks/preprocess_all_in_one.ipynb

* renamed notebook and finalised concise version of the data preparation step

* Complete re-write of the data pre-processing step to avoid resampling. This addresses issue #13.

* Added the following amongst predictors: GFED4 basis regions (as categorical variable) and area of grid cell at point (as continuous variable).

* Load formula changed to BA*CC*AGB/AREA

* added log-transformed variables

* updated notebooks with latest run

* model 6h, MAE

* experiments as in ESA-D1 report

* Updated README files to clarify there are two sets of experiments (by wikilimo and ecmwf).

* Update README.md
  • Loading branch information
cvitolo authored Jun 21, 2021
1 parent 879a0e8 commit 7fe88a5
Show file tree
Hide file tree
Showing 8 changed files with 16,989 additions and 20 deletions.
27 changes: 19 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,14 @@ pip install -U pip
pip install -r requirements.txt
```

This includes all the packages required for running the code in the repository.
This includes all the packages required for running the code in the repository, with the exclusion of the notebooks in the folder `notebooks/ecmwf` (see `notebooks/ecmwf/README.md` for the additional dependencies to install).

The content of this repository is split into 2 types of experiments:

1. target is the fuel load = burned areas * above ground biomass
2. target is dry matter = burned areas * above ground biomass * combustion coefficients / grid cell areas

## Experiment 1

### Data Description
7 years of global historical data, from 2010 - 2016 will be used for developing the machine learning models. All data used in this project is propietary and NOT meant for public release. Xarray, NumPy and netCDF libraries are used for working with the multi-dimensional geospatial data.
Expand Down Expand Up @@ -82,7 +89,7 @@ Args description:
Where root_path is the root save path provided for pre-processing.py
```
## Training
### Training
Entry-point for training is [src/train.py](src/train.py)
```
Expand All @@ -92,7 +99,7 @@ Args description:
* `--exp_name`: Name of the training experiment used for logging.
```
## Inference
### Inference
Entry-point for inference is [src/test.py](src/test.py)
```
Expand All @@ -103,37 +110,41 @@ Args description:
* `--results_path`: Directory where the result inference .csv files and .html visualizations are going to be stored.
```
### Pre-trained models
#### Pre-trained models
Pre-trained models are available at:
- [LightGBM.joblib](src/pre-trained_models/LightGBM.joblib)
- [CatBoost.joblib](src/pre-trained_models/CatBoost.joblib)
### Demo Notebooks
#### Demo Notebooks
Notebooks for training and inference:
- [LightGBM_training.ipynb](notebooks/LightGBM_training.ipynb)
- [LightGBM_inference.ipynb](notebooks/LightGBM_inference.ipynb)
- [CatBoost_training.ipynb](notebooks/CatBoost_training.ipynb)
- [CatBoost_inference.ipynb](notebooks/CatBoost_inference.ipynb)
## Fuel Load Prediction Visualizations:
### Fuel Load Prediction Visualizations:
- CatBoost for Mid-Latitudes
<img width="1025" alt="midlats-prediction-july16" src="https://user-images.githubusercontent.com/7680686/113362982-4d263500-936d-11eb-922e-5a0609e7a67e.png">
- LightGBM for Tropics
<img width="1025" alt="tropics-prediction-july16" src="https://user-images.githubusercontent.com/7680686/113362967-45ff2700-936d-11eb-93a3-5ad380393f03.png">
## Adding New Features:
### Adding New Features:
- Make sure the new dataset to be added is a single file in `.nc` format, containing data from 2010-16 and in 0.25x0.25 grid cell resolution.
- Match the features of the new dataset with the existing features. This can be done by going through `notebooks/EDA_pre-processed_data.ipynb`.
- Add the feature path as a variable to `src/utils/data_paths.py`. Further the path variable is needed to be added to either the time dependant or independant list (depending on which category it belongs to) present inside `export_feature_paths()`.
- The model will now also be trained on the added feature while running src/train.py!
## Documentation
### Documentation
Documentation is available at: [https://ml-fuel.readthedocs.io/en/latest/index.html](https://ml-fuel.readthedocs.io/en/latest/index.html).
## Experiment 2
Please refer to `notebooks/ecmwf/README.md` for a description of this experiment, instructions to install additional dependencies and the notebooks with the steps to perform the experiment.
## Info
This repository was developed by Anurag Saha Roy (@lazyoracle) and Roshni Biswas (@roshni-b) for the ESA-SMOS-2020 project. Contact email: `[email protected]`. The repository is now maintained by the Wildfire Danger Forecasting team at the European Centre for Medium-range Weather Forecast.
33 changes: 33 additions & 0 deletions notebooks/ecmwf/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Notebooks for pre-processing and modelling

Notebooks in this folder contain all the steps for data exploration, pre-processing, modelling and explainability using the H2O.ai framework.

#### Install dependencies

```bash
conda install cartopy
conda install -c h2oai h2o
```

## Notebooks

### 1. Data preparation

This notebook takes the raw/downloaded information and pre-processes it into a data frame. The data is then split into a train and test set using a stratified sampling strategy to make sure both datasets have the same proportion of biomes.

### 2. Exploratory data analysis

This notebook explores the data assembled in notebook `data_preparation.ipynb`. It looks at the probability distributions of outcome and predictors and identifies possible data transformations as well as correlations and redundandies amongst variables.

### 3. Model benchmark tests

This notebook uses the H2O.ai AutoML framework to benchmark various possible data transformations for outcome and predictors. It also compares model results in case all versus non-redundant features are used. The final result is a pre-processed dataset that will be used for the final modelling step in `model_definition.ipynb`.

### 4. Model definition and evaluation

This notebook uses the H2O.ai AutoML framework to model transformed outcome and predictors. It visualises averaged results over a map and uses the H2O.ai explainability module to identify model limitations and possible future avenues for improvements.


## Info

These notebooks were developed by the Wildfire Danger Forecasting team at the European Centre for Medium-range Weather Forecast for the ESA-SMOS-2020 project. For any queries, please contact ECMWF support portal: https://confluence.ecmwf.int/site/support.
11,918 changes: 11,918 additions & 0 deletions notebooks/ecmwf/data_preparation.ipynb

Large diffs are not rendered by default.

Loading

0 comments on commit 7fe88a5

Please sign in to comment.