The repo implements a workflow for iteratively labeling stamps.
On every iteration called a "campaign", all scripts from pipeline
folder are
run by user. pipeline/README.md
specifies the order (WIP).
The general workflow is the following.
- Pick new images to label.
- Detect pages, detect and classify stamps using previously trained models.
- Send the ML-labeled images to LabelMeAnnotationTool for the expert labeling.
- Send the expert-labeled stamps, grouped by name, to LabelMeAnnotationTool for cleaning.
- Repeat cleaning if needed. Can be made for this or for all campaigns.
- Postprocess, visualize, generate statistics, publish the dataset.
- Re-train all ML models.
Repo structure:
pipeline
have high level scripts. This is what a user should execute.scripts
workhorse helper scripts. Used for debuging and experimenting.constants.sh
contains various repo-wide anme and path conventions.