-
Notifications
You must be signed in to change notification settings - Fork 16
Update CTC tree-construction and new search algorithms to score sentence-end transitions with LabelScorer #152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/Search/LexiconfreeTimesyncBeamSearch/LexiconfreeTimesyncBeamSearch.cc
Outdated
Show resolved
Hide resolved
|
In the graph above, |
|
@hannah220 With
|
Currently, in
LexiconfreeTimesyncBeamSearchandTreeTimesyncBeamSearchsentence-end is not handled by the LabelScorer; it is only scored by the word-level LM. This PR adds logic to properly handle sentence-end transitions in the new search. The changes consist of the following points:SENTENCE_ENDis added as a new TransitionType for the LabelScorer.LexiconfreeTimesyncBeamSearchsentence-end-indexcan be specified as a parameterinferTransitionTypefunction is adjusted accordingly to assign it the newSENTENCE_ENDtransition typeTreeTimesyncBeamSearchCtcTreeBuilderis modified to include thesentenceEndLemmain the tree if it exists and has pronunciations.finalStatesis added to thePersistentStateTree. This is used in the search to determine which states are considered valid at segment end. If sentence-end is included in the tree, only the sentence-end sink state is added as final state.sentenceEndLabelIndex_is added as a member to the search algorithm and inferred from the lexiconinferTransitionTypefunction is also adjusted to produce theSENTENCE_ENDtransition typedecodeStep. This is because when thesentenceEndLemmahas an empty pronunciation (i.e., should only be scored by the LM and not the LabelScorer), the hypotheses may need to take a normal word-end exit and then the sentence-end exit back-to-back in the same decode step.Depends on changes to the transition types from #138.
Still requires testing.
Here are some plots of the new tree structure including sentence-end:
Tree without

sentence-endlemma in lexicon:Tree with

sentence-endlemma with empty pronunciation in lexicon:Tree with

sentence-endlemma and non-empty pronunciation in lexicon: