Implement LMNT full-duplex TTS generation #87
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
LMNT has an interesting approach to 'full-duplex' TTS:
This doesn't really work with our
FrameProcessor
architecture, but it does actually work pretty well with the idea of a custom pipeline. I've started building that out inexamples/foundational/02a-async-llm-say-one-thing.py
.I'm currently stuck because I think there's a bug in their implementation. After I get the first chunk of audio back, the websocket connection to LMNT seems to close. I've tried logging just about everywhere I can think of, and I can't find an explanation. I'm going to try and get in touch with the LMNT team to troubleshoot.
(I found out about LMNT through Andy Korman, who I met at our Voice + AI Summit.)