Highlights
- Pro
Lists (26)
Sort Name ascending (A-Z)
AcousticFrontend
AcousticModel
ASR
ASR-pretrain
ASV
AudioQuality
AwesomeList
Paper list, awesome list and so on.BandwidthExtension
Classification
Codec
Data
Develop
Evaluation
FrontEnd
FrontEnd for Text-to-SpeechHow-to
LLM
Music
Performance
Quant
SingingVoiceSynthesis
SpeechEditing
SpeechSeperation
Tools
Universal Method
Vocoder
VoiceConversion
Starred repositories
veRL: Volcano Engine Reinforcement Learning for LLM
🧑🚀 全世界最好的LLM资料总结 | Summary of the world's best LLM resources.
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
🚀 「大模型」3小时从0训练27M参数的视觉多模态VLM!🌏 Train a 27M-parameter VLM from scratch in just 3 hours!
Codec for paper: LLaSA: Scaling Train-time and Test-time Compute for LLaMA-based Speech Synthesis
Implementation of papers in 100 lines of code.
This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey".
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models
A PyTorch native library for large model training
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-To-Speech Using Natural Language Descriptions
SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Java Emoji (JEmoji) is a lightweight, fast and auto generated emoji library for Java with the purpose to improve and ease working with emojis
Dataset and code of GTSinger(NeurIPS 2024 Spotlight): A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, I…
An Open-Sourced LLM-empowered Foundation TTS System
欢迎来到 "LLM-travel" 仓库!探索大语言模型(LLM)的奥秘 🚀。致力于深入理解、探讨以及实现与大模型相关的各种技术、原理和应用。