GitHub - samarpita-bhaumik/Multimodal-Question-Answering-Model

We have tried 4 different combinations of bert and vit family, with and without using LORA on VQAv2 dataset available from Hugging Face.
Each folder contains the code of:
- Training the model by giving questions, images, answers and then doing the validation test, to check whether the model giving correct results or not.
- Testing the model by giving question and image and the model predicts the answer.
- Lightning-logs:Contain the graph of the metrices: Accuracy, F1 Score, Precision, Recall and Time Taken for Training (TTT).
- To see the graph of the metrices with Tensorboard, write the following code on Google Collab.
- This architecture can be extended to be even Video Question Answering as well at Real-time
- We are storing the checkpoints of every model and can be used later.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
bert+vit+15000+15epochs		bert+vit+15000+15epochs
bert+vit+lora+15000+15epochs		bert+vit+lora+15000+15epochs
roberta+beit+15000+15epochs		roberta+beit+15000+15epochs
roberta+beit+lora+15000+15		roberta+beit+lora+15000+15
roberta+vit+15000datapts+15epochs		roberta+vit+15000datapts+15epochs
vit+albert+15000+15epochs		vit+albert+15000+15epochs
vit+albert+lora+15000datapts+15epochs		vit+albert+lora+15000datapts+15epochs
vit+lora+roberta+15epochs+15000		vit+lora+roberta+15epochs+15000
README.md		README.md
VR_Final_Report_VQAProject.pdf		VR_Final_Report_VQAProject.pdf

Provide feedback