Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSL: support transcription tag [t] #495

Open
soshial opened this issue Jun 26, 2023 · 0 comments
Open

DSL: support transcription tag [t] #495

soshial opened this issue Jun 26, 2023 · 0 comments
Labels

Comments

@soshial
Copy link
Contributor

soshial commented Jun 26, 2023

By @yozhic:

Starting with version 8.0, LSD dictionaries can be decompiled into DSL, and we see its active use in them. In the decompiled DSL x3–x6, the content of the tag is read without difficulty, but in DSL 8.0–12.0 it is made up of "krakozyabry" (weird symbols used because of bad encoding ). Reason: Lingvo uses a special font to display the transcription, and therefore the history of the tag can be divided into 3 stages.

Stage 1: Lingvo 5.0–7.0. The program works with 8-bit ANSI encoding. Distributions are equipped with LingvoSansSerif and LingvoOEM fonts: 1st, apparently, for displaying cards, 2nd - transcriptions. The comparison shows that some cells of LingvoOEM are filled with other phonetic characters instead of the standard Win-1251 characters (in particular, the entire C0-FF range allocated in ANSI for Cyrillic). In their places, we see "krakozyabry" when we view such text without LingvoOEM. The font, therefore, has its own unique encoding, conditionally called ANSI DSL (this name occurs, but it is not clear where it came from). But in addition to the font with a special encoding, Lingvo, apparently, performs additional transformations on the transcription text (more on that next time). I can’t say what such a text looks like in DSL, because decompiled DSL versions 5.0-7.0 I have not met (as well as the decompilers themselves).

Stage 2: Lingvo 8.0–12.0. The program switches to work with Unicode, however, a font with a special encoding is still used to display the transcription. Distributions of these versions are equipped only with LingvoOEM, which is gradually evolving - by version 12, additional phonetic characters appear in it instead of the standard ones. The encoding of decompiled DSLs can be either Unicode or ANSI, and the encoding of the transcription text in them is unique - ANSI DSL - and without using a special font it looks like "krakozyabry".

Stage 3: Lingvo x3–x6. The program completely switches to work with Unicode. The special LingvoOEM is replaced by the special unicode NewtonPhonetABBYY.ttf. However, its “feature” is no longer the same as that of its predecessor: it does not use any unique encodings, but differs only in the style. This font is a version of Paratype's well-known Newton typeface, truncated to the ranges necessary for transcription display by ABBYY's order. Important: in all x3 dictionaries, the transcription text is transcoded from unique ANSI DSL to standard Unicode, i.e. nothing else is required for its correct display.

During the activity of Lingvo, the decompiled DSL dictionaries managed to widely disperse over the network, and two of their varieties settled in the hands of the people: with a “crazy” transcription and with Unicode. Now every new developer who takes on DSL support is faced with a choice: which branch of the t tag should he support.

The author of GoldenDict chose the side of the “krakozyabry” and made a mechanism for converting the contents of t into ANSI DSL. Now if we connect the Lingvo 8.0–12.0 dictionary to GD, then its transcription will be displayed “normally”, the x3–x6 dictionaries will also be displayed without problems, since the ranges of LingvoOEM and NewtonPhonetABBYY do not overlap.

image

Materials (HTUZ.zip) and test dics (HTUN.7z): https://mega.nz/folder/j0p2iTzb#0wY4UY6KuYNCpNmDVGb62g

See Goldendict implementation: https://github.com/goldendict/goldendict/blob/master/dsl_details.cc#LL566C1

@ilius ilius added the Feature label Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants