Many Chinese words are not transcribed by cmn-Hans/cmn-Hant #132

stefantaubert · 2022-09-07T14:35:08Z

Epitran didn't transcribe the vocabulary in failed.txt (4274 entries).

Is there a possibility to support a transcription of these entries?

Commands I've run:

import epitran
from pathlib import Path

epitran.download.cedict()
epi = epitran.Epitran('cmn-Hans', tones=False, ligatures=True, cedict_file="/home/mi/epitran_data/cedict.txt")
epi2 = epitran.Epitran('cmn-Hant', tones=False, ligatures=True, cedict_file="/home/mi/epitran_data/cedict.txt")
voc = Path("/home/mi/playground/chn/vocabulary.txt").read_text("UTF-8").splitlines()
failed = []

for v in voc:
  result = epi.transliterate(v)
  if result == v:
    result = epi2.transliterate(v)
    if result == v:
      failed.append(result)
Path("/tmp/failed.txt").write_text("\n".join(failed), "UTF-8")

Epitran version: 1.22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Many Chinese words are not transcribed by cmn-Hans/cmn-Hant #132

Many Chinese words are not transcribed by cmn-Hans/cmn-Hant #132

stefantaubert commented Sep 7, 2022 •

edited

Loading

Many Chinese words are not transcribed by cmn-Hans/cmn-Hant #132

Many Chinese words are not transcribed by cmn-Hans/cmn-Hant #132

Comments

stefantaubert commented Sep 7, 2022 • edited Loading

stefantaubert commented Sep 7, 2022 •

edited

Loading