Add substitutions option in ASR evaluation by avmonte · Pull Request #15343 · NVIDIA-NeMo/NeMo

avmonte · 2026-01-29T23:48:47Z

Hi there!

I've been experimenting with Armenian ASR and noticed a consistent evaluation issue: the model never outputs the single-character Armenian և (U+0587). Instead it outputs the decomposed form եւ (two characters). As the ground-truth transcripts contain և, this mismatch inflates WER despite the fact that the predicted text is linguistically equivalent.

This PR adds an evaluation-time normalization option to address that type of issues.

Added text_processing.substitutions for evaluation-time text normalization:
- Format: SRC~DST;SRC2~DST2;...
Substitutions are applied to both ground truth and predictions before WER
Supports literal Unicode and escaped input like \u0587.

Reproduce

Below is the configuration I used to run my evaluation

python NeMo/examples/asr/speech_to_text_eval.py
dataset_manifest=<path_to_manifest.json>
use_cer=False
only_score_manifest=True
text_processing.punctuation_marks=".,?!։՝՜՞՛«»֊—–-…"
text_processing.separate_punctuation=True
text_processing.do_lowercase=True
text_processing.rm_punctuation=True
text_processing.substitutions="\u0587~եւ"

Evaluation results

Base MCV test WER (from the HF model card): 9.90
Normalized MCV test WER: 5.42

Signed-off-by: avmonte <unstoppablehay@gmail.com>

avmonte · 2026-01-29T23:51:17Z

@titu1994, @redoctopus, @jbalam-nv, @okuchaiev

Please take a look :)

avmonte added 2 commits January 30, 2026 03:06

add equivalent characters substitution support for asr evaluation

52aafd3

Signed-off-by: avmonte <unstoppablehay@gmail.com>

refactor and add comments

09a5f0f

Signed-off-by: avmonte <unstoppablehay@gmail.com>

github-actions bot added the ASR label Jan 29, 2026

github-actions bot added the community-request label Jan 29, 2026

chtruong814 added the needs-follow-up Issue needs follow-up label Feb 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add substitutions option in ASR evaluation#15343

Add substitutions option in ASR evaluation#15343
avmonte wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
avmonte:avmonte/hy_liga_subs

avmonte commented Jan 29, 2026 •

edited

Loading

Uh oh!

avmonte commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

avmonte commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reproduce

Evaluation results

Uh oh!

avmonte commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

avmonte commented Jan 29, 2026 •

edited

Loading