Update CTC tree-construction and new search algorithms to score sentence-end transitions with LabelScorer #152

SimBe195 · 2025-10-08T10:13:28Z

Currently, in LexiconfreeTimesyncBeamSearch and TreeTimesyncBeamSearch sentence-end is not handled by the LabelScorer; it is only scored by the word-level LM. This PR adds logic to properly handle sentence-end transitions in the new search. The changes consist of the following points:

SENTENCE_END is added as a new TransitionType for the LabelScorer.
For LexiconfreeTimesyncBeamSearch
- A sentence-end-index can be specified as a parameter
- The inferTransitionType function is adjusted accordingly to assign it the new SENTENCE_END transition type
- If it is present, at the end of a segment only hypotheses that have emitted this sentence-end are kept (otherwise sentence-end-fallback is applied)
For TreeTimesyncBeamSearch
- The CtcTreeBuilder is modified to include the sentenceEndLemma in the tree if it exists and has pronunciations.
- A set of finalStates is added to the PersistentStateTree. This is used in the search to determine which states are considered valid at segment end. If sentence-end is included in the tree, only the sentence-end sink state is added as final state.
- A sentenceEndLabelIndex_ is added as a member to the search algorithm and inferred from the lexicon
- The inferTransitionType function is also adjusted to produce the SENTENCE_END transition type
- Add second-order exits to word-end-hypotheses in decodeStep. This is because when the sentenceEndLemma has an empty pronunciation (i.e., should only be scored by the LM and not the LabelScorer), the hypotheses may need to take a normal word-end exit and then the sentence-end exit back-to-back in the same decode step.

Depends on changes to the transition types from #138.
Still requires testing.

Here are some plots of the new tree structure including sentence-end:

Tree without sentence-end lemma in lexicon:

Tree with sentence-end lemma with empty pronunciation in lexicon:

Tree with sentence-end lemma and non-empty pronunciation in lexicon:

…mesyncBeamSearch

src/Search/LexiconfreeTimesyncBeamSearch/LexiconfreeTimesyncBeamSearch.cc

src/Search/TreeBuilder.cc

src/Search/TreeTimesyncBeamSearch/TreeTimesyncBeamSearch.cc

hannah220 · 2025-10-09T13:33:41Z

In the graph above, t is time, m is emission idx and tr is traceback?

larissakl · 2025-10-09T13:51:49Z

@hannah220 With m you're correct, it's the emission index, in this case (monophones) it's just the output index (=the position in the lexicon), so

m=0 -> </s>
m=1 -> A
m=2 -> B
m=3 -> _

t is the transition index (doesn't matter here) and tr is the transition state of an exit, so when you are at a word end, you transition to this state. For example after predicting </s> you now have tr=2 which means you go to the state (the root) with ID 2, so to this new "sentence-end root state", while with all "normal" exits you go to the "normal" root with ID 1 because of tr=1.

hannah220 · 2025-11-18T09:29:27Z

@SimBe195 Could you merge main?
I think #160 has changed many things and since I'm using this branch for testing LSTM LM, I would like a newer version.

src/Search/TreeTimesyncBeamSearch/TreeTimesyncBeamSearch.cc

src/Search/TreeBuilder.cc

src/Search/PersistentStateTree.cc

src/Search/TreeTimesyncBeamSearch/TreeTimesyncBeamSearch.cc

src/Search/LexiconfreeTimesyncBeamSearch/LexiconfreeTimesyncBeamSearch.hh

SimBe195 added 20 commits July 24, 2025 16:51

Add TransitionLabelScorer

7502b82

Rewrite docstring

7430001

Clean up includes

2a6272e

Rewrite docstring again

7e325e1

Merge branch 'master' into tdp_label_scorer

a276136

Refactor params to string list with compile time check

d2d78fe

Remove transitionTypeToIndex function and revert associated changes

303fa46

Revert unnecessary static_cast

ddd75c7

Change std=c++17 to c++20

b856c1e

Merge remote-tracking branch 'origin/version-bump' into tdp_label_scorer

70699c0

Move transition type string array to LabelScorer.hh

5b89d0f

Move transitionTypeArray to protected space

b9d919b

Add sentence-end transition to enum

54bee17

Sentence-end handling for lexiconfree-search

3dc887b

Add finalStates collection to PersistentStateTree

b1ba86a

Sentence-end handling for tree-search

667f558

Merge branch 'master' into tdp_label_scorer

1795685

Merge branch 'tdp_label_scorer' into sentence_end_handling

98b824f

Allow no pronunciations of sentence-end

dfdcfe7

Add sentence-end-index as member and to inferTransitionType in TreeTi…

aa099a4

…mesyncBeamSearch

SimBe195 requested review from curufinwe and larissakl October 8, 2025 10:13

Change log to warning when sentence-end is not included in tree

a04c2f9

Base automatically changed from tdp_label_scorer to master October 8, 2025 12:59

Merge branch 'master' into sentence_end_handling

1ee5547

hannah220 reviewed Oct 9, 2025

View reviewed changes

Suggestions from code review

d4f6202

hannah220 mentioned this pull request Oct 13, 2025

Update Nn::LabelScorer: allow enabling of specific transition types #148

Merged

SimBe195 added 2 commits November 18, 2025 14:47

Merge branch 'master' into sentence_end_handling

4dc2a06

Make TreeTimsyncBeamSearch work with master branch refactoring

82dc513

hannah220 reviewed Dec 2, 2025

View reviewed changes

src/Search/TreeTimesyncBeamSearch/TreeTimesyncBeamSearch.cc Show resolved Hide resolved

SimBe195 added 2 commits December 5, 2025 10:11

Merge branch 'master' into sentence_end_handling

73dcd58

Fix check for sentenceEndToken_ to exclude empty pronunciation

4e1095a

SimBe195 marked this pull request as ready for review December 5, 2025 10:27

SimBe195 mentioned this pull request Dec 5, 2025

Add Search::LexiconfreeLabelsyncBeamSearch #126

Merged

Add sentence-end to enabled transition types of LM preset

d20b401

hannah220 reviewed Dec 11, 2025

View reviewed changes

src/Search/TreeBuilder.cc Show resolved Hide resolved

Ignore sentence-begin in tree builder

1a5cf0a

hannah220 approved these changes Jan 14, 2026

View reviewed changes

curufinwe requested changes Jan 14, 2026

View reviewed changes

SimBe195 added 5 commits January 15, 2026 12:22

Remove unnecessary check

df04ff5

Fix order of members in LexiconfreeTimesyncBeamSearch

7ef7fde

Inline getSentenceBeginLemma function

71346e8

Merge branch 'master' into sentence_end_handling

e58e543

Fix ordering of member initialization

47afd33

curufinwe changed the title ~~Score sentence-end transitions with LabelScorer in new search~~ Update CTC tree-construction and new search algorithms to score sentence-end transitions with LabelScorer Jan 15, 2026

curufinwe approved these changes Jan 15, 2026

View reviewed changes

curufinwe merged commit 6f07816 into master Jan 15, 2026
1 of 2 checks passed

curufinwe deleted the sentence_end_handling branch January 15, 2026 13:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update CTC tree-construction and new search algorithms to score sentence-end transitions with LabelScorer #152

Update CTC tree-construction and new search algorithms to score sentence-end transitions with LabelScorer #152

Uh oh!

SimBe195 commented Oct 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hannah220 commented Oct 9, 2025

Uh oh!

larissakl commented Oct 9, 2025

Uh oh!

hannah220 commented Nov 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Update CTC tree-construction and new search algorithms to score sentence-end transitions with LabelScorer #152

Update CTC tree-construction and new search algorithms to score sentence-end transitions with LabelScorer #152

Uh oh!

Conversation

SimBe195 commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hannah220 commented Oct 9, 2025

Uh oh!

larissakl commented Oct 9, 2025

Uh oh!

hannah220 commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

SimBe195 commented Oct 8, 2025 •

edited

Loading

hannah220 commented Nov 18, 2025 •

edited

Loading