Skip to content

Conversation

@SimBe195
Copy link
Collaborator

@SimBe195 SimBe195 commented Oct 8, 2025

Currently, in LexiconfreeTimesyncBeamSearch and TreeTimesyncBeamSearch sentence-end is not handled by the LabelScorer; it is only scored by the word-level LM. This PR adds logic to properly handle sentence-end transitions in the new search. The changes consist of the following points:

  • SENTENCE_END is added as a new TransitionType for the LabelScorer.
  • For LexiconfreeTimesyncBeamSearch
    • A sentence-end-index can be specified as a parameter
    • The inferTransitionType function is adjusted accordingly to assign it the new SENTENCE_END transition type
    • If it is present, at the end of a segment only hypotheses that have emitted this sentence-end are kept (otherwise sentence-end-fallback is applied)
  • For TreeTimesyncBeamSearch
    • The CtcTreeBuilder is modified to include the sentenceEndLemma in the tree if it exists and has pronunciations.
    • A set of finalStates is added to the PersistentStateTree. This is used in the search to determine which states are considered valid at segment end. If sentence-end is included in the tree, only the sentence-end sink state is added as final state.
    • A sentenceEndLabelIndex_ is added as a member to the search algorithm and inferred from the lexicon
    • The inferTransitionType function is also adjusted to produce the SENTENCE_END transition type
    • Add second-order exits to word-end-hypotheses in decodeStep. This is because when the sentenceEndLemma has an empty pronunciation (i.e., should only be scored by the LM and not the LabelScorer), the hypotheses may need to take a normal word-end exit and then the sentence-end exit back-to-back in the same decode step.

Depends on changes to the transition types from #138.
Still requires testing.

Here are some plots of the new tree structure including sentence-end:

Tree without sentence-end lemma in lexicon:
no_eos

Tree with sentence-end lemma with empty pronunciation in lexicon:
with_eos_no_pron

Tree with sentence-end lemma and non-empty pronunciation in lexicon:
with_eos_with_pron

Base automatically changed from tdp_label_scorer to master October 8, 2025 12:59
@hannah220
Copy link
Contributor

In the graph above, t is time, m is emission idx and tr is traceback?

@larissakl
Copy link
Contributor

@hannah220 With m you're correct, it's the emission index, in this case (monophones) it's just the output index (=the position in the lexicon), so

m=0 -> </s>
m=1 -> A
m=2 -> B
m=3 -> _

t is the transition index (doesn't matter here) and tr is the transition state of an exit, so when you are at a word end, you transition to this state. For example after predicting </s> you now have tr=2 which means you go to the state (the root) with ID 2, so to this new "sentence-end root state", while with all "normal" exits you go to the "normal" root with ID 1 because of tr=1.

@hannah220
Copy link
Contributor

hannah220 commented Nov 18, 2025

@SimBe195 Could you merge main?
I think #160 has changed many things and since I'm using this branch for testing LSTM LM, I would like a newer version.

@SimBe195 SimBe195 marked this pull request as ready for review December 5, 2025 10:27
@curufinwe curufinwe changed the title Score sentence-end transitions with LabelScorer in new search Update CTC tree-construction and new search algorithms to score sentence-end transitions with LabelScorer Jan 15, 2026
@curufinwe curufinwe merged commit 6f07816 into master Jan 15, 2026
1 of 2 checks passed
@curufinwe curufinwe deleted the sentence_end_handling branch January 15, 2026 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants