Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-unicode primitive support and major Kanji database update #191

Closed
wants to merge 19 commits into from

Conversation

mjuhanne
Copy link
Contributor

@mjuhanne mjuhanne commented Aug 1, 2023

Add capability to use primitives that have no Unicode character representation. They are instead referenced with [primitive_name] tags in primitives and primitive_of lists as well as Heisig stories and comments.

The tags are then converted into image links when constructing their visual representation.

Major update to kanji.db, trying to replicate Heisig's keywords and primitives as accurately as possible:

  • Add all the missing non-Unicode character primitives
  • Fix many mix-ups between primitives ('good luck' vs 'lidded crock', 'chop-seal' vs 'stamps' among others)
  • separate 'hand' and 'fingers' primitives for clarity.
  • primitives for 'flowers', 'city walls' and 'pinnacle' actually reference the more common radicals instead of weird archaic ones
  • Fix many erroneous primitive references (such as 'water' being used instead of 'rice grains')
  • Fix many Heisig stories and comments. Also add italic and bold text when referencing keywords

Changes to the original kanji database (as of 7/2023) are merged from kanji-ext.tsv files. Included is the updated kanji.db. There are three different batches of updates present in this PR. Log of changes can be found in corresponding db_merge_log_X.md files in Markdown format (the log and tsv files are now removed for cleaner addon directory, but can be found in the individual commits below if needed)

mjuhanne added 11 commits July 15, 2023 13:49
…sentation. They are instead referenced with [primitive_name] tags in primitives and primitive_of lists.

The tags are then converted into image links when constructing visual representation.

Included are .svg files for the non-Unicode primitives gathered from https://github.com/cyphar/heisig-rtk-index repository.
…imitives. The file contains modifications to few select existing kanji and adds references to these missing primitives.

Included is also a tool script that merges kanji-ext.tsv to kanji.db. It also recalculates primitive_of references.
- Fix creating cards that have non-unicode primitives
- Modify Production and Recognition HTML files to use images for these primitives
- Copy primitive .svg files to media collection directory
…ts using [primitive] tag. Also tags work correctly now in primitives and primitive-of lists and popups.

Updated all the .svg files (removed width and height elements so CSS can adjust their size properly)
… and primitives as accurately as possible:

- Add all the missing non-Unicode character primitives
- Fix many mix-ups between primitives ('good luck' vs 'lidded crock',  'chop-seal' vs 'stamps' among others)
- separate 'hand' and 'fingers' primitives for clarity.
- primitives for 'flowers', 'city walls' and 'pinnacle' actually reference the more common radicals instead of weird arcaic ones
- Fix many erroneous primitive references (such as 'water' being used instead of 'rice grains')
- Fix many Heisig stories and comments. Also add italic and bold text when referencing keywords

Changes to the original kanji database (as of 7/2023) are read from kanji-ext.tsv. Included is the updated kanji.db.  Log of changes can be found in db_merge_log.md in Markdown format
…ed correctly with white color when used in buttons
- Remove references to odd archaic primitives without Heisig keyword
- Add many missing references to primitives and fix numerous wrong ones
- Keep few alternative primitives separate from their main counterparts because they are so distinct visually ('cloak' vs. 'garment'.  'scarf' vs 'garment'. )
- Remove erroneous alternative primitive keywords.
- Add commentary to many Heisig stories and comments, adding references to similar primitives and kanjis so lessen confusion.
…ences, comments and links

Minor update to db_merge tool. Changes can be found kanji-ext3.tsv and in more readable form in db_merge_log_3.md
@calculuschild
Copy link

See #175 where I have done some simple testing.

…Migaku Kanji database Excel sheet.

Updated .tsv merge tool and added a new script to extract data from user modified fields to .tsv patch file
@calculuschild
Copy link

calculuschild commented Aug 9, 2023

Just downloaded the latest changes you made. A few notes I've been collecting on entries that still seem to need fixing:

  1. 邑 (city walls?)- Has no primitives or name listed, though they are mentioned in the stories.
  2. 求 (request) - Primitives seem to be missing according to RTK (should be "arrowhead, drop, rice"). But one story points out this should be "water" instead "rice"
  3. 捕 (catch) - Also should use the "arrowhead" primitive instead of "arrow"
  4. 奐 (clear, bright) - Missing all information
  5. 巛 (flood) - Completely blank. Other cards like 拶 (imminent) now incorrectly point to the primitive 川 (stream) which doesn't match the stories

It's possible some of these are leftover garbage cards after rebuilding the deck that I just need to suspend, but reporting just in case.

@mjuhanne
Copy link
Contributor Author

mjuhanne commented Aug 11, 2023

@calculuschild Thank you for testing!

Just downloaded the latest changes you made. A few notes I've been collecting on entries that still seem to need fixing:

  1. 邑 (city walls?)- Has no primitives or name listed, though they are mentioned in the stories.
    Corrected
  1. 求 (request) - Primitives seem to be missing according to RTK (should be "arrowhead, drop, rice"). But one story points out this should be "water" instead "rice"

Corrected

  1. 捕 (catch) - Also should use the "arrowhead" primitive instead of "arrow"

I think there's some confusion what Heisig actually means with the 'arrowhead'. It's not listed as a separate primitive but mentioned in the comments of 'arrow', 'request' and 'dog tag' primitives. How I see it, the arrow is still there in those primitives but the body is straightened out and merged with other primitives (like what happens in many of the kanjis in his stories). That's why I haven't created a yet another sub-primitive but keep referring to the 'arrow'.

  1. 奐 (clear, bright) - Missing all information

That's a non-Heisig primitive (shared by 換 and 喚), but will be listed as an secondary/alternate new primitive after the next patch of update. There will be later a possibility to learn with these advanced (non-Heisig) primitives (optional feature).

Also I'm adding a feature for the user to modify the primitive list of each kanji (as well as the Heisig story and comments) to facilitate a crowdsourcing effort to go thru all the kanjis that is not included in Heisig's books. I'm now in the middle of cross checking database with RTK3 but for the rarer stuff (kanji frequency 3000+ to ~12000) I obviously don't have time to do myself. I'm not a paid worker of Migaku project, but just a volunteer doing this stuff on my free time :)

  1. 巛 (flood) - Completely blank. Other cards like 拶 (imminent) now incorrectly point to the primitive 川 (stream) which doesn't match the stories

巛 is now listed as an alternative to 川 and database should reflect all the changes now.

It's possible some of these are leftover garbage cards after rebuilding the deck that I just need to suspend, but reporting just in case.

Please note that not all fixes are yet commited. The next batch should be in few days and hopefully the database for the 3000 most common kanjis should be fairly complete by then.

… RTK3 (kanjis 2000-3000). Minor fixes. Added 'I-ching', 'futon' and 'butchers meeting' non-Unicode primitives
…ds'. When adding manually the characters the scan process will search for new sub-primitives even when the main character is already in the stack. Also show a 'Cancel' button in the KanjiConfirmDialog.
@mjuhanne
Copy link
Contributor Author

Closing for now. I'll make a new PR that uses the new refactored code (in other PR) and clean this up a bit too.

@mjuhanne mjuhanne closed this Aug 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants