Skip to content

Pinned Loading

  1. VITA Public

    ✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

    Python 2.2k 168

  2. Long-VITA Public

    ✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

    Python 275 29

  3. Freeze-Omni Public

    ✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

    Python 309 20

  4. Woodpecker Public

    ✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

    Python 634 31

Repositories

Showing 6 of 6 repositories
  • LUCY Public

    LUCY: Linguistic Understanding and Control Yielding Early Stage of Her

    Python 37 3 8 0 Updated Apr 14, 2025
  • Sparrow Public

    Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation

    Jupyter Notebook 28 Apache-2.0 0 0 0 Updated Mar 28, 2025
  • VITA Public

    ✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

    Python 2,240 168 49 1 Updated Mar 28, 2025
  • Long-VITA Public

    ✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

    Python 275 29 4 1 Updated Mar 20, 2025
  • Freeze-Omni Public

    ✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

    Python 309 20 10 2 Updated Jan 2, 2025
  • Woodpecker Public

    ✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

    Python 634 31 2 0 Updated Dec 23, 2024