Embed arbitrary modalities (images, audio, documents, etc) into large language models.
multimodal multi-modality large-language-models llm vision-language-model llava large-context large-multimodal-models
-
Updated
Mar 27, 2024 - Python