Maya: Multimodal Multilingual LLM

Models and Dataset at HuggingFace
Paper: arXiv
Try Maya Model: Demo

Multimodal LLM supporting 8 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi

Install

The following steps worked on a CUDA Version: 12.4.

Clone this repository and navigate to maya directory

git clone https://github.com/nahidalam/maya
cd maya

Install Package

conda create -n maya python=3.10 -y
conda activate maya
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Install additional packages for training cases

pip install -e ".[train]"
pip install flash-attn==2.6.3 --no-build-isolation --no-cache-dir

Model Weights and Dataset

HuggingFace

Train

Pretraining

To pretrain the projection layer,

get the pretraining dataset from HuggingFace and keep it in /dev/data/LLaVA_Pretrain
get the images with wget https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain/resolve/main/images.zip and keep them in /dev/data/images

bash scripts/maya/pretrain_aya_siglip.sh

Instruction Tuning

Please download the annotations from MBZUAI/palo_multilingual_dataset and all images following the below links.

COCO: train2017
GQA: images
OCR-VQA: download script,
TextVQA: train_val_images
VisualGenome: part1, part2

After downloading all of them, organize the data as follows in /dev/data/instruction_tune_dataset/,

instruction_tune_dataset
    ├── coco
    │   └── train2017
    ├── gqa
    │   └── images
    ├── ocr_vqa
    │   └── images
    ├── textvqa
    │   └── train_images
    └── vg
        ├── VG_100K
        └── VG_100K_2

Put the palo_multilingual_dataset.json in /dev/data/annotations/palo_multilingual_dataset.json

Make sure to keep the pretrained model you have in a path that you specify in the scripts/maya/finetune_aya_siglip.sh script throught the --pretrain_mm_mlp_adapter flag

Then run

bash scripts/maya/finetune_aya_siglip.sh

Evaluation

For multilingual evaluation using PALO multilingual test dataset

Download the PALO evaluation dataset: Create the following directory structure if it doesn't exist.

LLaVA/playground/data/eval
git clone https://huggingface.co/datasets/MBZUAI/multilingual-llava-bench-in-the-wild

Specifically test images can be found here
Run the evaluation script

bash scripts/v1_5/eval/eval_all_languages.sh \
    "model_base" \
    "model_path" \
    "model_name" \
    "your-openai-api-key"

Citation

If you find Maya useful for your research and applications, please cite using this BibTeX:

@misc{alam2024mayainstructionfinetunedmultilingual,
      title={Maya: An Instruction Finetuned Multilingual Multimodal Model}, 
      author={Nahid Alam and Karthik Reddy Kanjula and Surya Guthikonda and Timothy Chung and Bala Krishna S Vegesna and Abhipsha Das and Anthony Susevski and Ryan Sze-Yin Chan and S M Iftekhar Uddin and Shayekh Bin Islam and Roshan Santhosh and Snegha A and Drishti Sharma and Chen Liu and Isha Chaturvedi and Genta Indra Winata and Ashvanth. S and Snehanshu Mukherjee and Alham Fikri Aji},
      year={2024},
      eprint={2412.07112},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.07112}, 
}

Contributors

In no particular order

Team Leads: Nahid Alam, Karthik Reddy, Surya Guthikonda
Timothy Chung
Abhipsha Das
Bala Krishna S Vegesna
Iftekhar Uddin
Drishti Sushma
Roshan Santhosh
Shayakh Islam
Isha Chaturvedi
Chen Liu
Snegha A
Anthony Susevski
Ashvanth.S
Genta Indra Winata
Ryan Chan
Sangyeon Kim
Snehanshu

Acknowledgement

This codebase is based on LLaVA. Thank you for the easily understandable codebase.
This project would not be possible without the support of Cohere and their Aya-35B API grant. We are thankful to Sara Hooker, Madeline, Shivalika, Shristhi and the entire Cohere for AI team for their support.
We thank Merve and the HuggingFace team for GPU support for the inference demo

Name	Name	Last commit message	Last commit date
Latest commit nahidalam Update README.md Jan 6, 2025 2092286 · Jan 6, 2025 History 42 Commits
evaluation	evaluation	initial	Nov 27, 2024
llava	llava	fp16	Dec 20, 2024
playground	playground	Removed the unnecessary files	Dec 7, 2024
scripts	scripts	cleanup unnecessary file	Dec 8, 2024
001.jpg	001.jpg	added sample images	Dec 15, 2024
002.jpg	002.jpg	added sample images	Dec 15, 2024
011.jpg	011.jpg	added sample images	Dec 15, 2024
022.jpg	022.jpg	added sample images	Dec 15, 2024
023.jpg	023.jpg	added sample images	Dec 15, 2024
024.jpg	024.jpg	added sample images	Dec 15, 2024
LICENSE	LICENSE	initial	Nov 27, 2024
README.md	README.md	Update README.md	Jan 6, 2025
cog.yaml	cog.yaml	initial	Nov 27, 2024
predict.py	predict.py	initial	Nov 27, 2024
pyproject.toml	pyproject.toml	pin down version and update readme	Dec 9, 2024
requirements.sh	requirements.sh	initial	Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Maya: Multimodal Multilingual LLM

Contents

Install

Model Weights and Dataset

Train

Pretraining

Instruction Tuning

Evaluation

Citation

Contributors

Acknowledgement

About

Releases

Packages

Contributors 3

Languages

License

nahidalam/maya

Folders and files

Latest commit

History

Repository files navigation

Maya: Multimodal Multilingual LLM

Contents

Install

Model Weights and Dataset

Train

Pretraining

Instruction Tuning

Evaluation

Citation

Contributors

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages