Highlights
- Pro
LLM
BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
The reproduced code for Google's SoundStorm
Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
AcademiCodec: An Open Source Audio Codec Model for Academic Research
Implementation of Meta-Voicebox : The first generative AI model for speech to generalize across tasks with state-of-the-art performance.
DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection and Instruction-Aware Models for Conversational AI
潘多拉,一个让你呼吸顺畅的ChatGPT。Pandora, a ChatGPT that helps you breathe smoothly.
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
Foundational Models for State-of-the-Art Speech and Text Translation
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
SALMONN: Speech Audio Language Music Open Neural Network
Experimental code: sound file preprocessing to optimize Whisper transcriptions without hallucinated texts
The official implementation of HierSpeech++
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Official implementation for Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning
Instant voice cloning by MIT and MyShell. Audio foundation model.
Foundational model for human-like, expressive TTS