This directory contains a recipe scripts for collecting and reviewing image captioning data with Prodigy. The captioning model is implemented in PyTorch based on this tutorial. To use the pretrained model, download the files from here and place them all in this directory. For more details on custom recipes with Prodigy, check out the documentation.
📺 This project was created as part of a step-by-step video tutorial.
For more details on the recipes, check out
image_caption.py
or run a recipe with --help
, for
example: prodigy image-caption -F image_caption.py --help
.
Start the server, stream in images from a directory and allow annotating them
with captions. Captions will be saved in the data as the field "caption"
.
prodigy image-caption caption_data ./images -F image_caption.py
Start the server, stream in images from a directory and display the generated
captions in the text field, allowing the annotator to change them if needed.
Captions will be saved in the data as the field "caption"
and the original
unedited caption will be preserved as "orig_caption"
. Prints the counts of
changed vs. unchanged captions on exit.
prodigy image-caption.correct caption_data ./images -F image_caption.py
This recipe expects the files vocab.pkl
, encoder-5-3000.pkl
and
decoder-5-3000.pkl
to be present in the same directory. You can download a
pretrained model
from here.
If needed, the recipe could be edited to allow the model path to be passed in as
a recipe argument that's
then passed to load_model
.
Go through all edited captions in a dataset created with image-caption.correct
and select why the caption was changed, based on multiple choice options. Prints
the counts of options on exit.
prodigy image-caption.correct caption_data_diff caption_data -F image_caption.py
The options are currently hard-coded in the recipe
image_caption.py
, but the recipe could be modified to take
a JSON file of options instead via a
recipe argument.