You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Starting with version 8.0, LSD dictionaries can be decompiled into DSL, and we see its active use in them. In the decompiled DSL x3–x6, the content of the tag is read without difficulty, but in DSL 8.0–12.0 it is made up of "krakozyabry" (weird symbols used because of bad encoding ). Reason: Lingvo uses a special font to display the transcription, and therefore the history of the tag can be divided into 3 stages.
Stage 1: Lingvo 5.0–7.0. The program works with 8-bit ANSI encoding. Distributions are equipped with LingvoSansSerif and LingvoOEM fonts: 1st, apparently, for displaying cards, 2nd - transcriptions. The comparison shows that some cells of LingvoOEM are filled with other phonetic characters instead of the standard Win-1251 characters (in particular, the entire C0-FF range allocated in ANSI for Cyrillic). In their places, we see "krakozyabry" when we view such text without LingvoOEM. The font, therefore, has its own unique encoding, conditionally called ANSI DSL (this name occurs, but it is not clear where it came from). But in addition to the font with a special encoding, Lingvo, apparently, performs additional transformations on the transcription text (more on that next time). I can’t say what such a text looks like in DSL, because decompiled DSL versions 5.0-7.0 I have not met (as well as the decompilers themselves).
Stage 2: Lingvo 8.0–12.0. The program switches to work with Unicode, however, a font with a special encoding is still used to display the transcription. Distributions of these versions are equipped only with LingvoOEM, which is gradually evolving - by version 12, additional phonetic characters appear in it instead of the standard ones. The encoding of decompiled DSLs can be either Unicode or ANSI, and the encoding of the transcription text in them is unique - ANSI DSL - and without using a special font it looks like "krakozyabry".
Stage 3: Lingvo x3–x6. The program completely switches to work with Unicode. The special LingvoOEM is replaced by the special unicode NewtonPhonetABBYY.ttf. However, its “feature” is no longer the same as that of its predecessor: it does not use any unique encodings, but differs only in the style. This font is a version of Paratype's well-known Newton typeface, truncated to the ranges necessary for transcription display by ABBYY's order. Important: in all x3 dictionaries, the transcription text is transcoded from unique ANSI DSL to standard Unicode, i.e. nothing else is required for its correct display.
During the activity of Lingvo, the decompiled DSL dictionaries managed to widely disperse over the network, and two of their varieties settled in the hands of the people: with a “crazy” transcription and with Unicode. Now every new developer who takes on DSL support is faced with a choice: which branch of the t tag should he support.
The author of GoldenDict chose the side of the “krakozyabry” and made a mechanism for converting the contents of t into ANSI DSL. Now if we connect the Lingvo 8.0–12.0 dictionary to GD, then its transcription will be displayed “normally”, the x3–x6 dictionaries will also be displayed without problems, since the ranges of LingvoOEM and NewtonPhonetABBYY do not overlap.
By @yozhic:
Starting with version 8.0, LSD dictionaries can be decompiled into DSL, and we see its active use in them. In the decompiled DSL x3–x6, the content of the tag is read without difficulty, but in DSL 8.0–12.0 it is made up of "krakozyabry" (weird symbols used because of bad encoding ). Reason: Lingvo uses a special font to display the transcription, and therefore the history of the tag can be divided into 3 stages.
Stage 1: Lingvo 5.0–7.0. The program works with 8-bit ANSI encoding. Distributions are equipped with LingvoSansSerif and LingvoOEM fonts: 1st, apparently, for displaying cards, 2nd - transcriptions. The comparison shows that some cells of LingvoOEM are filled with other phonetic characters instead of the standard Win-1251 characters (in particular, the entire C0-FF range allocated in ANSI for Cyrillic). In their places, we see "krakozyabry" when we view such text without LingvoOEM. The font, therefore, has its own unique encoding, conditionally called ANSI DSL (this name occurs, but it is not clear where it came from). But in addition to the font with a special encoding, Lingvo, apparently, performs additional transformations on the transcription text (more on that next time). I can’t say what such a text looks like in DSL, because decompiled DSL versions 5.0-7.0 I have not met (as well as the decompilers themselves).
Stage 2: Lingvo 8.0–12.0. The program switches to work with Unicode, however, a font with a special encoding is still used to display the transcription. Distributions of these versions are equipped only with LingvoOEM, which is gradually evolving - by version 12, additional phonetic characters appear in it instead of the standard ones. The encoding of decompiled DSLs can be either Unicode or ANSI, and the encoding of the transcription text in them is unique - ANSI DSL - and without using a special font it looks like "krakozyabry".
Stage 3: Lingvo x3–x6. The program completely switches to work with Unicode. The special LingvoOEM is replaced by the special unicode NewtonPhonetABBYY.ttf. However, its “feature” is no longer the same as that of its predecessor: it does not use any unique encodings, but differs only in the style. This font is a version of Paratype's well-known Newton typeface, truncated to the ranges necessary for transcription display by ABBYY's order. Important: in all x3 dictionaries, the transcription text is transcoded from unique ANSI DSL to standard Unicode, i.e. nothing else is required for its correct display.
During the activity of Lingvo, the decompiled DSL dictionaries managed to widely disperse over the network, and two of their varieties settled in the hands of the people: with a “crazy” transcription and with Unicode. Now every new developer who takes on DSL support is faced with a choice: which branch of the t tag should he support.
The author of GoldenDict chose the side of the “krakozyabry” and made a mechanism for converting the contents of t into ANSI DSL. Now if we connect the Lingvo 8.0–12.0 dictionary to GD, then its transcription will be displayed “normally”, the x3–x6 dictionaries will also be displayed without problems, since the ranges of LingvoOEM and NewtonPhonetABBYY do not overlap.
Materials (HTUZ.zip) and test dics (HTUN.7z): https://mega.nz/folder/j0p2iTzb#0wY4UY6KuYNCpNmDVGb62g
See Goldendict implementation: https://github.com/goldendict/goldendict/blob/master/dsl_details.cc#LL566C1
The text was updated successfully, but these errors were encountered: