This repository contains my solution to the "PetFinder.my - Pawpularity Contest" challenge, hosted in Kaggle.
In this competition, we’ll analyze raw images and metadata to predict the “Pawpularity” of pet photos.
For reproducibility, we included a Docker image we used to develop and test the application. We defined the Machine Learning pipeline in DVC, a version control system for machine learning projects.
First, we copy our personal kaggle.json
file to the project's main directory. This file is used to authenticate to the Kaggle API, and download the competition data from inside the Docker container.
$ cp ~/.kaggle/kaggle.json .
Build the Docker image.
$ make build
Start a Docker container based on the newly built image.
$ make start
Start a bash shell in the container.
$ make attach
Reproduce the DVC pipeline.
$ dvc repro
Here is a brief description of what each folder contains:
data
: input and pre-processed datanbs
: notebooks for exploration analysespipe
: Python scripts for each step in the DVC pipelinesrc
: source code for companion libraryckpts
: model checkpointsouts
: model outputs
Other important files are:
dvc.yaml
: list input, output, and parameters used by each DVC stepparams.yaml
: parameters used for DVC steps
The companion library (ml
) is installed in editable mode. Which means you don't need to rebuild the Docker container every time you make a change to it.
When contributing to this repository, please consider using the following convention to label your commit messages.
BUG
: fixing a bugDEV
: development environment ― e.g., Docker, TensorBoard, system dependenciesDOC
: documentationEDA
: exploratory data analysisML
: modeling, feature engineeringMAINT
: maintenance ― e.g., refactoringOPS
: ml ops ― e.g., download/unzip/pre- and post-process data