Skip to content
View bobo0810's full-sized avatar
  • North University of China
  • Beijing
  • 06:17 - 8h ahead

Block or report bobo0810

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A collection of omni-mllm

14 1 Updated Mar 28, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 1,729 119 Updated Mar 30, 2025

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 142,167 28,467 Updated Mar 30, 2025

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 12,520 1,254 Updated Mar 25, 2025

MM-EUREKA: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

Python 461 16 Updated Mar 29, 2025
Python 4,090 328 Updated Mar 12, 2025

StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding

Python 114 3 Updated Mar 27, 2025

An easy-to-use, fast, and easily integrable tool for evaluating audio LLM

Python 70 1 Updated Mar 27, 2025

Ola: Pushing the Frontiers of Omni-Modal Language Model

Python 321 14 Updated Feb 28, 2025

Pruning the VLLMs

Python 90 4 Updated Dec 9, 2024

《代码随想录》LeetCode 刷题攻略:200道经典题目刷题顺序,共60w字的详细图解,视频难点剖析,50余张思维导图,支持C++,Java,Python,Go,JavaScript等多语言版本,从此算法学习不再迷茫!🔥🔥 来看看,你会发现相见恨晚!🚀

Shell 55,149 11,917 Updated Mar 17, 2025

[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!

Python 665 60 Updated Mar 17, 2025

LAVIS - A One-stop Library for Language-Vision Intelligence

Jupyter Notebook 10,399 1,010 Updated Nov 18, 2024

多模态 MM +Chat 合集

Python 250 19 Updated Feb 18, 2025

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

Python 137 1 Updated Dec 6, 2024

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 7,372 566 Updated Mar 20, 2025

Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model

Python 259 9 Updated Jun 25, 2024

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 9,227 938 Updated Mar 28, 2025
Python 3,630 334 Updated Feb 24, 2025

A collection of visual instruction tuning datasets.

Python 76 3 Updated Mar 14, 2024

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".

Python 423 40 Updated Apr 24, 2024

MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.

Jupyter Notebook 7,282 464 Updated Nov 6, 2024

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding

Python 605 67 Updated Dec 10, 2024

SGLang is a fast serving framework for large language models and vision language models.

Python 12,644 1,391 Updated Mar 30, 2025

A family of lightweight multimodal models.

Python 1,005 75 Updated Nov 18, 2024

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 2,099 305 Updated Mar 30, 2025

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Python 3,738 556 Updated Apr 24, 2024

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 13,050 1,878 Updated Mar 29, 2025
Next
Showing results