satyrid

An attention-based image description model.

This code is based on Kelvin Xu's arctic captions described in Show, Attend and Tell: Neural Image Caption Generation with Visual Attention.

Changes:

Create input data from a directory of images and a JSON file concerning the descriptions.
Gradient norm and value clipping.
Most recent version of the ADAM optimiser (v8).
Monitor training performance using external metrics.

Dependencies

Python 2.7
Theano
NumPy
scikit learn
skimage
PyTables (for reading the image features)

To extract visual features from your own images and to create training, validation, and test inputs files, you will need:

Caffe built with the Python bindings (if you want to extract visual features by yourself)

To use the evaluation script (metrics.py): see coco-caption for the requirements. Install coco-caption in evaluate/ and create an empty __init__.py in evaluate/ so it can be imported as a module.

Creating new dataset objects

make_dataset.py takes care of creating the image features file and the sentences file. See make_dataset.py for instructions on how to create dataset files from your data.

If you create a new dataset, you will need to create a new dataset loader module to work with your new dataset. See flickr30k.py for how to do this.

Training a model

You can train a model using THEANO_FLAGS=floatX=float32 python train_model.py. See the documentation in train_model.py and model.py for more information on the options.

If you want to use the metrics.py script to control training the model (e.g. save model parameters based on Meteor or CIDEr), then pass "{'use_metrics':'True'}" as an argument to train_model.py and install the dependencies for the coco-caption for the requirements.

Generating descriptions

Generate descriptions using THEANO_FLAGS=floatX=float32 python generate_caps.py $model_name $PREFIX. This will generate descriptions into $PREFIX.dev.txt and $PREFIX.test.txt. Use the --dataset $DATASET_NAME argument to generate descriptions of images in a different dataset.

Reference

If you use this code as part of any published research, please acknowledge the following paper (it encourages researchers who publish their code!):

"Show, Attend and Tell: Neural Image Caption Generation with Visual Attention." Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio. ICML (2015)

License

The code is released under a revised (3-clause) BSD License.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
ref		ref
splits		splits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
alpha_visualization.ipynb		alpha_visualization.ipynb
coco.py		coco.py
extract_conv_feats.py		extract_conv_feats.py
flickr30k.py		flickr30k.py
flickr8k.py		flickr8k.py
generate_caps.py		generate_caps.py
homogeneous_data.py		homogeneous_data.py
make_dataset.py		make_dataset.py
metrics.py		metrics.py
model.py		model.py
optimizers.py		optimizers.py
requirements.txt		requirements.txt
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

satyrid

Dependencies

Creating new dataset objects

Training a model

Generating descriptions

Reference

License

About

Releases

Packages

Languages

License

elliottd/satyrid

Folders and files

Latest commit

History

Repository files navigation

satyrid

Dependencies

Creating new dataset objects

Training a model

Generating descriptions

Reference

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages