[FEATURE] Multilanguage Heisig data #213

vincentbohlen · 2024-08-19T08:39:03Z

I am not a native English speaker. Most of the Kanji learning material and advice available on the Internet is in English or is referencing English material. While I have no problem understanding English, when applying the RTK method, I run into trouble with Heisig's sometimes outlandish choice of keywords, or the use of synonyms making it important to memorize nuances. This is already difficult in one's native language but even more difficult if the nuances are not internalized for second language.
Since a translated and adapted version of RTK is available in different languages, I prefer using the keywords used in the version published in my native language.

I played around with how to use the Kanji GOD add on in German and while it would be a possible solution to enter the translations into the custom keyword / custom primitive column, it would be a lot of manual work.
I had an almost complete list of keywords and primitives on my PC, so I wrote some basic SQL to alter my local Kanji.db and overwrote the English keywords/primitives with the German ones. This works great for me.
I am now thinking about also adding Heisig's stories and comments, but I personally don't necessarily need them anymore.
I assume that there are other Japanese learners who would benefit from having a version of Kanji GOD which aligns with the localized version of RTK they might be using. This is not about customizing the data but providing the "official" localized set of keywords as a different base set.

Providing the data for different languages would be a one time effort. DB could manually be replaced by user but language selection with data load may be the nicer solution. Migaku already allows for the selection of dictionaries for different languages removing the mental work of translating from English. Adjusting Kanji GOD data would make for a seamless experience.

mjuhanne · 2024-08-21T19:19:58Z

@vincentbohlen
Actually the groundwork for this is already done. There is a fork of Kanji GOD (https://github.com/mjuhanne/Migaku-Kanji-Addon/tree/test_storydb) which contains bunch of stuff improvements that I haven't yet tried to merge into the main branch.

One of the improvements is Story DB: It takes the stories (Heisig, Koohi) from Kanji DB into a separate Story DB. In this database each row contains a set of data for each kanji (source name, keyword, story, primitives). The source here refers to Heisig / Koohi / RRTK / Wanikani / "crowd-sourced". RRTK and Wanikani data is gathered from a couple of Anki decks and the crowd-sourced stuff is a mixture of best-of-the-best of Koohi stories and keywords (manually checked so they don't conflict with Heisig ones), in addition to some of my own mnemonics and keywords.

What you (and maybe other users for other languages) would like to create is another "source" into Story DB (for example heisig_de for german Heisig keywords). The process would be

create a tab-separated file, in which each row would consist of source name + kanji + keywords
merge those changes into Story DB with a separate Python script

If you'd like to try this approach, let me know and I can walk you through it.

The test branch contains bunch of other improvements so you might want to take a look at it anyway:

Editing mode for editing actual keywords, stories and primitives list in the Kanji Lookup tool
Updated Koohi stories for all available kanjis (Kanken 1.5 - 1)
More non-Heisig primitives
Primitives for all 6000 Kanken kanjis
Marking conflicting keywords (in Wanikani/RRTK vs Heisig) using strike-through text (see the image below)

I've used it like this for the past 8 months or so now so it should be farely stable. If you want to try it, make sure you use the right branch and DON'T FORGET TO BACKUP your previous Kanji GOD directory and Anki decks :)

Here's some screenshots with the current status:

Heisig, RRTK and Wanikani sources:

Koohi and Crowd-sourced:

Edit mode(here editing the Crowd-sources primitives list)

vincentbohlen added idea New or something to think about new feature New feature or request requires discussion Either complex, underdeveloped, or possibly contentious labels Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Multilanguage Heisig data #213

[FEATURE] Multilanguage Heisig data #213

vincentbohlen commented Aug 19, 2024

mjuhanne commented Aug 21, 2024 •

edited

Loading

[FEATURE] Multilanguage Heisig data #213

[FEATURE] Multilanguage Heisig data #213

Comments

vincentbohlen commented Aug 19, 2024

mjuhanne commented Aug 21, 2024 • edited Loading

mjuhanne commented Aug 21, 2024 •

edited

Loading