ml4docs-automation

The repo implements a workflow for iteratively labeling stamps. On every iteration called a "campaign", all scripts from pipeline folder are run by user. pipeline/README.md specifies the order (WIP).

The general workflow is the following.

Pick new images to label.
Detect pages, detect and classify stamps using previously trained models.
Send the ML-labeled images to LabelMeAnnotationTool for the expert labeling.
Send the expert-labeled stamps, grouped by name, to LabelMeAnnotationTool for cleaning.
Repeat cleaning if needed. Can be made for this or for all campaigns.
Postprocess, visualize, generate statistics, publish the dataset.
Re-train all ML models.

Repo structure:

pipeline have high level scripts. This is what a user should execute.
scripts workhorse helper scripts. Used for debuging and experimenting.
constants.sh contains various repo-wide anme and path conventions.

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
envs		envs
pipeline		pipeline
scripts		scripts
utils		utils
.gitignore		.gitignore
README.md		README.md
TODO.txt		TODO.txt
constants.sh		constants.sh
explore.ipynb		explore.ipynb
path_generator.sh		path_generator.sh
pipeline.ipynb		pipeline.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ml4docs-automation

About

Releases

Packages

Contributors 2

Languages

pscedu/ml4docs-automation

Folders and files

Latest commit

History

Repository files navigation

ml4docs-automation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages