Image Captioning PyTorch

Submitted by Srimanth Agastayraju ([email protected]) and Joseph Presnell([email protected]) as part of the final project of the course Deep Learning Systems, Fall 2022

A brief overview

The project presents an Image Captioning model pretrained on ImageNet to generate captions for the Flickr8k dataset. We used a convolutional neural network (CNN) to conduct feature extraction on Flikr8k images. Those feature vectors are then passed to a recurrent neural network (RNN). We used a long short-term memory (LSTM) RNN because the feedback connections allow us to process sequences of data. The target captions are fed into the RNN and converted to vector embeddings. These are combined with the image vectors before being processed by the LSTM layer. To generate captions, we feed each LSTM cell output into another LSTM cell to produce a sequence of words that are combined to create a caption.

We've conducted two experiments - ResNet50 and ResNeXt50 as pretrained feature extractors. Both of them seem to provide different results. While ResNeXt50 outperforms ResNet50 in classification tasks, in this case, the ResNet model seems to perform better.

Implementation Details

To run the training script, please run:

python3 train.py

To download the dataset, please visit: https://www.kaggle.com/datasets/adityajn105/flickr8k

To visualize the model outputs, please visit: https://wandb.ai/asrimanth/Image-Captioning?workspace=user-asrimanth

We can visualize 2 experiments in the above wandb link: sleek-totem-15: ResNet50 as the feature extractor. dazzling-sun-14: ResNeXt50 as the feature extractor.

In the main method of the file, we can edit the configuration and the data path of the experiment. If you want to load the models we've trained, please load the weights from the following link: DRIVE LINK

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
.gitignore		.gitignore
README.md		README.md
demo.txt		demo.txt
flickr_dataset.py		flickr_dataset.py
model_report.csv		model_report.csv
network.py		network.py
resnet50_True_embed_256_hidden_256_lstm_2_B_64_model_report.csv		resnet50_True_embed_256_hidden_256_lstm_2_B_64_model_report.csv
resnext50_32x4d_True_embed_256_hidden_256_lstm_2_B_64_model_report.csv		resnext50_32x4d_True_embed_256_hidden_256_lstm_2_B_64_model_report.csv
temp.csv		temp.csv
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Captioning PyTorch

Submitted by Srimanth Agastayraju ([email protected]) and Joseph Presnell([email protected]) as part of the final project of the course Deep Learning Systems, Fall 2022

A brief overview

Implementation Details

About

Releases

Packages

Languages

asrimanth/Image-captioning-pytorch

Folders and files

Latest commit

History

Repository files navigation

Image Captioning PyTorch

Submitted by Srimanth Agastayraju ([email protected]) and Joseph Presnell([email protected]) as part of the final project of the course Deep Learning Systems, Fall 2022

A brief overview

Implementation Details

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages