Help: Code that fixes the issue of inflections on Kindle dictionaries #1

pyccp · 2022-06-02T15:27:04Z

Hello Hannes,

I've been trying to create a decent Kindle dictionary based on Wiktionary [English-Finnish] for some time and I came across the issue of inflections clashing with headwords when I tried to create the dictionary like you did, I even created a thread on mobileread two days ago about this topic.

I see that you have solution, an algorithm to fix this issue. Can you please explain how you solved this issue? I've read the document titled "The stupid kindle algorithm" and you mention that it can be fixed with three lines of code, can you please share this code and explain how this code can be used to fix the issue?

Vuizur · 2022-06-03T08:17:28Z

Hello,

my solution is here. Note that you need to replace 1 function and a constant in pyglossary as described at the top of the file.
It is also possible that instead of using unidecode, you need to use another function to strip the diacritics that the Kindle fuzzy search algorithm ignores (I used it because it works for Spanish, but it probably doesn't work for many other languages).

The best future scenario would be if we could add this to pyglossary directly.

Vuizur · 2022-07-26T13:57:21Z

Here is a project that converts a tabfile to a fixed kindle dictionary: https://github.com/Vuizur/pyglossary-kindle-test/tree/master/pyglossary_kindle_test
(Install it using poetry install, and execute poetry run python ./pyglossary_kindle_test/edit_dictionary.py (In the script file you can specify the kindlegen folder and your tabfile path.)

pyccp · 2022-08-31T18:24:49Z

I have created a short tab (and | ) separated txt file for Finnish-English and tested it, it works brilliantly, though some of the words shows up twice in the dictionary. But I suspect this is an expected behavior. Thank you.

Checked words below showed up twice in the dictionary:

pyccp · 2022-09-01T14:51:23Z

With great regret, anguish and disappointment, I must tell you that after creating a Finnish-English dictionary many times from a 143 250 line txt file, the look-ups for inflected forms of the words failed almost entirely.

I checked the txt file to see if anything is wrong with the formatting and also checked the xhtml files, and inflected forms were recorded inside infl tags.

Look-up for inflections only worked when txt file were small, for example I selected 35 words out of 143 250 and created a 35-line txt to make a dictionary and it worked.

I also always get this message at the beginning after executing poetry run python ./pyglossary_kindle_test/edit_dictionary.py command:

No module named 'pyglossary.plugin_lib.py310'

Vuizur · 2022-09-03T08:33:06Z

The error message about No module named 'pyglossary.plugin_lib.py310' should not be a problem, in my experience pyglossary still works the same when it is thrown.

If I understand it correctly you tried to use these dictionaries on kindle? I think it might have problems with the huge number of inflections. I would try to re-run the program like described in the README.md of this repo with the option try_to_fix_failed_inflections set to False. Maybe kindle will work better with this one.

pyccp · 2022-09-03T13:59:25Z

Yes, I am sending the created dictionaries to my Kindle e-reader to test them. I already have the same dictionary with 143 250 entries that I am trying to fix in my e-reader, the device handles it with all inflections. I created it using mobigen (much faster than kindlegen), it is about 10 MB. But of course the inflections are messed up because of the Kindle algorithm and headwords clash with inflections.

I later created sample dictionaries with 100, 1000, and 10 000 entries using pyglossary-kindle-test repository, all seemed to work fine. I also created Spanish-English dictionary using the En-Es.txt file that comes with that repo, and this one, too, worked fine. But not the Finnish dictionary with all entries included.

A moment ago I tried ebook_dictionary_creator repo to create a Finnish dictionary and I got this error:

Traceback (most recent call last):
  File "C:\Users\user\Desktop\Py Project\Project Dictio\trial\dictio.py", line 5, in 
<module>
    dict_creator.create_database()
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\ebook_dictionary_creator\e_dictionary_creator\dictionary_creator.py", line 64, in create_database
    create_database.create_database(
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\ebook_dictionary_creator\database_creator\create_database.py", line 728, in create_database
    obj = json.loads(line)
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\json\__init__.py", 
line 346, in loads
    return _default_decoder.decode(s)
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\json\decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 266 (char 265)
Traceback locals:
    self = <json.decoder.JSONDecoder object at 0x0000020371ABFE50>
    s = '{"pos": "noun", "head_templates": [{"name": "head", "args": {"1": "f...      
    len(s) = 277
    idx = 0```

Vuizur · 2023-02-25T12:06:02Z

I later created sample dictionaries with 100, 1000, and 10 000 entries using pyglossary-kindle-test repository, all seemed to work fine. I also created Spanish-English dictionary using the En-Es.txt file that comes with that repo, and this one, too, worked fine. But not the Finnish dictionary with all entries included.

I think Finnish simply has too many inflections for that terrible kindlegen program. If you hit some completely arbitrary limits, it will refuse to work correctly, and also gives you no hint on how to fix this or which entry exactly is responsible for the failure. Very bad software, but unfortunately there is no solution.

Hmm, I tried creating a Finnish dictionary on my Windows system, but here it worked well. So I would need system/Python version info to maybe replicate it.

pyccp closed this as completed Aug 31, 2022

pyccp reopened this Sep 1, 2022

Vuizur self-assigned this Feb 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help: Code that fixes the issue of inflections on Kindle dictionaries #1

Help: Code that fixes the issue of inflections on Kindle dictionaries #1

pyccp commented Jun 2, 2022

Vuizur commented Jun 3, 2022

Vuizur commented Jul 26, 2022

pyccp commented Aug 31, 2022 •

edited

Loading

pyccp commented Sep 1, 2022 •

edited

Loading

Vuizur commented Sep 3, 2022

pyccp commented Sep 3, 2022

Vuizur commented Feb 25, 2023

Help: Code that fixes the issue of inflections on Kindle dictionaries #1

Help: Code that fixes the issue of inflections on Kindle dictionaries #1

Comments

pyccp commented Jun 2, 2022

Vuizur commented Jun 3, 2022

Vuizur commented Jul 26, 2022

pyccp commented Aug 31, 2022 • edited Loading

pyccp commented Sep 1, 2022 • edited Loading

Vuizur commented Sep 3, 2022

pyccp commented Sep 3, 2022

Vuizur commented Feb 25, 2023

pyccp commented Aug 31, 2022 •

edited

Loading

pyccp commented Sep 1, 2022 •

edited

Loading