Skip to content

Files

Latest commit

 

History

History
 
 

image_caption

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Manual and model-assisted image captioning

This directory contains a recipe scripts for collecting and reviewing image captioning data with Prodigy. The captioning model is implemented in PyTorch based on this tutorial. To use the pretrained model, download the files from here and place them all in this directory. For more details on custom recipes with Prodigy, check out the documentation.

📺 This project was created as part of a step-by-step video tutorial.

Usage

For more details on the recipes, check out image_caption.py or run a recipe with --help, for example: prodigy image-caption -F image_caption.py --help.

recipe image-caption: Collect image captions manually

Start the server, stream in images from a directory and allow annotating them with captions. Captions will be saved in the data as the field "caption".

prodigy image-caption caption_data ./images -F image_caption.py

recipe image-caption.correct: Model-assisted image captioning

Start the server, stream in images from a directory and display the generated captions in the text field, allowing the annotator to change them if needed. Captions will be saved in the data as the field "caption" and the original unedited caption will be preserved as "orig_caption". Prints the counts of changed vs. unchanged captions on exit.

prodigy image-caption.correct caption_data ./images -F image_caption.py

This recipe expects the files vocab.pkl, encoder-5-3000.pkl and decoder-5-3000.pkl to be present in the same directory. You can download a pretrained model from here. If needed, the recipe could be edited to allow the model path to be passed in as a recipe argument that's then passed to load_model.

recipe image-caption.diff: Review corrected image captions

Go through all edited captions in a dataset created with image-caption.correct and select why the caption was changed, based on multiple choice options. Prints the counts of options on exit.

prodigy image-caption.correct caption_data_diff caption_data -F image_caption.py

The options are currently hard-coded in the recipe image_caption.py, but the recipe could be modified to take a JSON file of options instead via a recipe argument.