Skip to content
This repository has been archived by the owner on Feb 25, 2023. It is now read-only.

some kanji get dropped when converting KANJIDIC2 #34

Open
freebiesoft opened this issue Aug 12, 2021 · 1 comment
Open

some kanji get dropped when converting KANJIDIC2 #34

freebiesoft opened this issue Aug 12, 2021 · 1 comment

Comments

@freebiesoft
Copy link

at least one kanji, '剝' , gets dropped when converting from kanjidic2.xml to the ZIP file. This kanji is clearly in the kanjidic2.xml file, but it's not in the converted ZIP file.

@freebiesoft freebiesoft changed the title some kanji get dropped when running on KANJIDIC2 some kanji get dropped when converting KANJIDIC2 Aug 12, 2021
@rnpnr
Copy link
Contributor

rnpnr commented Jan 10, 2023

I looked into this a bit and there are some kanji that get dropped in the xml->yomidict conversion but '剝' isn't one of them. Here is the entry:

[
    "",
    "ハク ホク",
    "へ.ぐ へず.る む.く む.ける は.がれる は.ぐ は.げる は.がす",
    "jouyou",
    [
        "come off",
        "peel",
        "fade",
        "discolor"
    ],
    {
        "grade": "8",
        "halpern_kkd": "2105",
        "jis213": "1-15-94",
        "moro": "2049",
        "skip": "1-8-2",
        "strokes": "10",
        "ucs": "525D"
    }
],

The kanji that are skipped are because of the following checks in kanjidic.go:

func kanjidicExtractKanji(entry jmdict.KanjidicCharacter, language string) *dbKanji {
	if entry.ReadingMeaning == nil {
		return nil
	}
...
	if len(kanji.Meanings) == 0 {
		return nil
	}
...
}

which is probably fine for a J-E dictionary like this.

Can you elaborate on where it was you detected this issue (or anyone else with the same issue)?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants