This repository contains source code generated by Luminide. It may be used to train, validate and tune deep learning models for image classification. The following directory structure is assumed:
├── code (source code)
├── input (dataset)
└── output (working directory)
The dataset should have images inside a directory named train_images and a CSV file named train_cultivar_mapping.csv. An example is shown below:
input
├── train_cultivar_mapping.csv
└── train_images
├── 2017-06-16__12-24-20-930_0.jpg
├── 2017-06-16__12-24-20-930_1.jpg
├── 2017-06-16__12-24-20-930_2.jpg
The CSV file is expected to have labels under a column named cultivar as in the example below:
image,cultivar
2017-06-16__12-24-20-930.jpg,PI_257599
2017-06-02__16-48-57-866.jpg,PI_154987
2017-06-12__13-18-07-707.jpg,PI_92270
- Accept competition rules.
- Attach a Compute Server that has a GPU (e.g. gcp-t4).
- Configure your Kaggle API token on the
Import Datatab. - On the
Import Datatab, choose Kaggle and then enteranlthms/sorghum1(User Dataset). - Train a model using the
Run Experimentmenu. - Launch inference.sh from the
Run Experimenttab to create a submission and use submit.sh to upload it to Kaggle. - Check the leaderboard to see your score!
- Use the
Experiment Trackingmenu to track experiments. - To tune the hyperparameters, edit sweep.yaml as desired and launch a sweep from the
Run Experimenttab. Tuned values will be copied back to a file calledconfig-tuned.yamlalong with visualizations insweep-results.html. - To use the tuned hyperparameter values, copy them over to
config.yamlbefore training a model. - For exploratory analysis, run eda.ipynb.
- To monitor training progress, use the
Experiment Visualizationmenu. - After an experiment is complete, use the file browser on the IDE interface to access the results on the IDE Server.
- To generate a report on the most recent training session, run report.sh from the
Run Experimenttab. Make sureTrack Experimentis checked. The results will be copied back to a file calledreport.html.
NOTE: As configured, the code trains on 20% of the data. To train on the entire dataset, edit full.sh and fast.sh to remove the --subset command line parameter so that the default value of 100 is used.
For more details on usage, see Luminide documentation