Lists (3)
Sort Name ascending (A-Z)
Stars
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
MM-EUREKA: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning
StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding
An easy-to-use, fast, and easily integrable tool for evaluating audio LLM
Ola: Pushing the Frontiers of Omni-Modal Language Model
《代码随想录》LeetCode 刷题攻略:200道经典题目刷题顺序,共60w字的详细图解,视频难点剖析,50余张思维导图,支持C++,Java,Python,Go,JavaScript等多语言版本,从此算法学习不再迷茫!🔥🔥 来看看,你会发现相见恨晚!🚀
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!
LAVIS - A One-stop Library for Language-Vision Intelligence
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
A collection of visual instruction tuning datasets.
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
SGLang is a fast serving framework for large language models and vision language models.
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
DeepSeek-VL: Towards Real-World Vision-Language Understanding
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术