Skip to content

Simple example with partial and final results for chunked audio stream? #4216

Answered by titu1994
fquirin asked this question in Q&A
Discussion options

You must be logged in to vote

So as you've already guessed, Nemo ASR models are complex underneath the hood. I will spend some time next week to see a minimal script based on yours that works. But before that look at https://huggingface.co/spaces/smajumdar/nemo_conformer_rnnt_large_streaming

There's no special tricks applied here, it's the most inefficient inference method on chunks, and that works fine actually. These models in Nemo are not true streaming models, but offline models. We can make them work in streaming mode in multiple ways - buffered inference script above is one of those days. It's a more accurate form of the simple "predict full chunk, every chunk, and concat results" method I've used in the above d…

Replies: 2 comments 14 replies

Comment options

You must be logged in to vote
13 replies
@fquirin
Comment options

@fquirin
Comment options

@VahidooX
Comment options

@fquirin
Comment options

@fquirin
Comment options

Answer selected by ericharper
Comment options

You must be logged in to vote
1 reply
@fquirin
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants