- S3Tokenizer - xingchensong
-
🌟 Scaling Transformers for Low-Bitrate High-Quality Speech Coding,
arXiv, 2411.19842
, arxiv, pdf, cication: -1Julian D Parker, Anton Smirnov, Jordi Pons, ..., Zach Evans, Xubo Liu · (stable-codec - Stability-AI)
-
Towards Robust Speech Representation Learning for Thousands of Languages,
arXiv, 2407.00837
, arxiv, pdf, cication: 4William Chen, Wangyou Zhang, Yifan Peng, ..., Karen Livescu, Shinji Watanabe · (wavlab)
-
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models,
arXiv, 2410.24177
, arxiv, pdf, cication: -1Heng-Jui Chang, Hongyu Gong, Changhan Wang, ..., James Glass, Yu-An Chung
-
🌟 Continuous Speech Synthesis using per-token Latent Diffusion,
arXiv, 2410.16048
, arxiv, pdf, cication: -1Arnon Turetzky, Nimrod Shabtay, Slava Shechtman, ..., Ron Hoory, Avihu Dekel · (s3.us-south.objectstorage.softlayer)
-
SNAC: Multi-Scale Neural Audio Codec,
arXiv, 2410.14411
, arxiv, pdf, cication: 2Hubert Siuzdak, Florian Grötschla, Luca A. Lanzendörfer
· (snac - hubertsiuzdak)
-
DM-Codec: Distilling Multimodal Representations for Speech Tokenization,
arXiv, 2410.15017
, arxiv, pdf, cication: -1Md Mubtasim Ahasan, Md Fahim, Tasnim Mohiuddin, ..., Md Mofijul Islam, Amin Ahsan Ali