This is a collection of information about the dictionaries available for Yomichan/Yomitan so that you can see the description/metadata available, and see the number of entries in each dictionary to get an idea of how extensive their coverage is.
You can run the script yourself by pasting the contents of generateStats.js into your browser's console while on the Yomitan options page.
Note
The entry count numbers naively count the amount of term entries in total in each Yomitan dictionary. This can lead to some dictionaries such as jpdb and Jitendex being overcounted as they have greater coverage for variants of many terms while other dictionaries might only have one entry for most terms.
Title | Entry Count | Information |
---|---|---|
Wikipedia | 853593 | Author: Thermospore Revision: frequency_v2 URL: https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists/Japanese2015_10000 Description: v0.1: Only change is the shortened title + addition of metadata v1: Reserved version; never completed v2: Moved to a much larger dataset. List goes up to about 850k now... |
BCCWJ-LUW | 824594 | Author: toasted-nutbread Revision: 1 URL: https://github.com/toasted-nutbread/yomichan-bccwj-frequency-dictionary Attribution: Copyright National Institute for Japanese Language and Linguistics https://pj.ninjal.ac.jp/corpus_center/bccwj/en/freq-list.html Description: Long unit word frequencies from the Balanced Corpus of Contemporary Written Japanese (BCCWJ). |
JPDB | 515231 | Author: jpdb, Marv Revision: JPDB_by-frequency-global_2022-05-10T03:27:02.930Z URL: https://jpdb.io Description: Generated via userscript: https://github.com/MarvNC/jpdb-freq-list ㋕ is used to indicate a frequency for a hiragana reading. ❌ is used to indicate that a term does not appear in the JPDB corpus. |
Innocent Ranked | 280000 | Revision: frequency1 |
Novels | 269987 | Revision: yyyy |
jpDicts (206k) | 206621 | Revision: frequency_jpDicts Description: Frequency list made from Japanese dictionary (ハイブリッド新辞林 v2, 故事ことわざの辞典, 漢字源, 精選版 日本国語大辞典, 新明解四字熟語辞典, 学研 四字熟語辞典, 実用日本語表現辞典,, 旺文社国語辞典 第十一版 画像無し, 大辞林 第三版, デジタル大辞泉, 岩波国語辞典 第六版, 広辞苑 第六版) |
Youtube | 187053 | Revision: frequency3 |
青空文庫熟語 | 169623 | Author: vtrm, Marv Revision: aozoraBunko_2022-08-26T05:50:26.124Z URL: https://www.aozora.gr.jp Attribution: 青空文庫 Description: Rank-based jukugo frequencies made from Aozora Bunko Data from https://vtrm.net/japanese/kanji-jukugo-frequency/en Created with https://github.com/MarvNC/yomichan-dictionaries Caveats: Jukugo which are absent from the dictionary entries are not reported in the data since the software has no way of knowing whether it encountered a legitimate jukugo or merely a juxtaposition of several words (e.g. when two or more nouns are combined together to form a new noun, or when a jukugo is used as an adverb). Sometimes a compound word can be either a Sino-Japanese jukugo read in on’yomi, or a native Japanese word read in kun’yomi and sometimes accompanied with okurigana. For example, 蹌踉 can either be a taru-adjective or to-adverb of Chinese origin read そうろう, or the root of a Japanese verb whose dictionary form is 蹌踉めく and which is read よろめく. Keep in mind that the program I wrote doesn’t parse kana and doesn’t try to disambiguate kanji readings. Consequently, occurrences of 蹌踉 read そうろう and 蹌踉 read よろ aren’t distinguished and are grouped together in the statistics. So if you look at the data for kanji 蹌, the line corresponding to 蹌踉 refers to all occurrences of 蹌踉 in the corpus, whatever their respective readings is. Due to the parsing method used and to the imperfect nature of Chinese characters word segmentation algorithms, there is a small (negligible but non-zero) number of false positives and missed out words. |
CC100 | 160836 | Author: xydustc Revision: 1 Description: Sudachipy Mode B & fugashi parsed CC100 datasaet, filtered by dictionaries |
国語辞典 | 156452 | Revision: kokugojiten_freq |
Netflix | 129141 | Revision: netflix.frequency |
ウェブ NWJC | 106754 | Revision: NWJC_ver202202_2024-03-03T20:03:52.800154+00:00 URL: https://masayu-a.github.io/NWJC/ Description: Converted programmatically from the dataset. See repo at https://github.com/Maltesaa/CSJ_and_NWJC_yomitan_freq_dict. Fork of https://github.com/forsakeninfinity/CEJC_yomichan_freq_dict |
Anime & J-drama | 99999 | Revision: anime.frequency |
VNs Freq | 85374 | Revision: frequency2019 |
Narou Freq | 49269 | Revision: frequency1 |
話し言葉 CSJ | 42542 | Revision: CSJ_ver201803_2024-03-03T20:04:01.044705+00:00 URL: https://clrd.ninjal.ac.jp/csj/en/index.html Description: Converted programmatically from the dataset. See repo at https://github.com/Maltesaa/CSJ_and_NWJC_yomitan_freq_dict. Fork of https://github.com/forsakeninfinity/CEJC_yomichan_freq_dict |
VN Freq | 35058 | Revision: frequency1 |
Conversation Corpus | 29528 | Revision: CEJC_ver202209_2023-06-22T11:04:44.335323+00:00 URL: https://www2.ninjal.ac.jp/conversation/cejc/cejc-wc.html Description: Converted programmatically from the dataset. See repo at https://github.com/forsakeninfinity/CEJC_yomichan_freq_dict |
Title | Entry Count | Information |
---|---|---|
Jitendex [2023-12-12] | 290488 | Author: stephenmk Revision: 3.1 URL: jitendex.org Attribution: © CC BY-SA 4.0 Stephen Kraus 2023 You are free to use, modify, and redistribute Jitendex files under the terms of the Creative Commons Attribution-ShareAlike License (V4.0) Jitendex includes material from several copyrighted sources in compliance with the terms and conditions of those projects. • JMdict (EDICT, etc.) dictionary data is provided by the Electronic Dictionaries Research Group. Visit edrdg.org for more information. • Example sentences (Japanese and English) are provided by Tatoeba (https://tatoeba.org/). This data is licensed CC BY 2.0 FR. • Positional information for the furigana displayed in headwords is provided by the JmdictFurigana project. This data is distributed under a Creative Commons Attribution-ShareAlike License. |
新和英 | 152202 | Revision: Shinwaei1 |
NEW斎藤和英大辞典 | 47504 | Revision: saitoje.2023.03.26.1 Description: Tip: use custom CSS to control how many example sentences are displayed [data-dictionary="NEW斎藤和英大辞典"] ul.gloss-sc-ul>li:nth-child(n+5) { display: none; } |
Title | Entry Count | Information |
---|---|---|
JA Wikipedia [2022-12-01] | 1279999 | Author: Wikipedians, DBPedia, Marv Revision: wikipedia_2023-12-17T22:34:35.417Z URL: https://ja.wikipedia.org/ Attribution: Wikipedia Description: Wikipedia short abstracts from the DBPedia dataset available at https://databus.dbpedia.org/dbpedia/text/short-abstracts. Recommended custom CSS: div.gloss-sc-div[data-sc-jawiki=red] { color: #e5007f; } Created with https://github.com/MarvNC/yomichan-dictionaries |
JMnedict | 666086 | Author: yomitan-import Revision: JMnedict.2023-12-17 URL: https://github.com/themoeway/yomitan-import Attribution: This publication has included material from the JMdict (EDICT, etc.) dictionary files in accordance with the licence provisions of the Electronic Dictionaries Research Group. See http://www.edrdg.org/ |
Pixiv [2024-02-17] | 591048 | Author: Pixiv contributors, Marv Revision: 1.0.0 URL: https://github.com/MarvNC/pixiv-yomitan Attribution: https://dic.pixiv.net Description: Article summaries from the Pixiv encyclopedia (ピクシブ百科事典), 546036 articles included. Pixiv dumps used to build this found at https://github.com/MarvNC/pixiv-dump. Built with https://github.com/MarvNC/yomichan-dict-builder. |
デジタル大辞泉 | 527290 | Author: ッツ Revision: daijisen_20210506;2021-07-27 URL: https://dictionary.goo.ne.jp Attribution: 監修:松村明 編集委員:池上秋彦、金田弘、杉崎一雄、鈴木丹士郎、中嶋尚、林巨樹、飛田良文 編集協力:田中牧郎、曽根脩 © Shogakukan Inc. https://daijisen.jp Description: 30万4千項目以上(2021年04月現在)を収録、言葉の集大成といえる大型国語辞典。年3回。定期更新を行い、最新の項目と日々修正される最新のデータを提供しています。 |
大辞林 第四版 | 334751 | Revision: daijirin2;2023-07-10 Attribution: © Sanseido Co., LTD. 2019 |
精選版 日本国語大辞典 | 286365 | Revision: seisenban3 |
広辞苑 第七版 | 257602 | Author: shoui520 & Thermospore Revision: 3 URL: https://github.com/Thermospore/koj72yomi Attribution: http://kojien.iwanami.co.jp/ Description: If you find any bugs or things to be improved, don't hesitate to post an issue on the github repo! --- rev.BETA1 --- ・Initial testing release --- rev.BETA2 --- ・Minor gaiji and PoS adjustments ・Added PoS for phrases (ex ○席末を汚す) ・Extracted readings for ALPH entries (ex 5W1H or DoS) ・Cleaned out some empty quotes in the structured content ・Stripped out loan source words (ex "class" being listed as kanji for クラス) --- rev.1 --- ・Initial public release ・Minor adjustment to loan source stripping ・Fixed some errors in the gaiji map --- rev.2 --- ・Add support for 圏点 (ex in くつ‐かぶり 【沓冠】) ・Some disambiguation info is now extracted (ex you can now see(音節)and(感動詞)in the entries for "あ") ・Use structured content for instead of \n ・Add margins to make hierarchical entries more readable, and adjust icons to suit (ex つう・ずる 【通ずる】) ・The alph icon is now supported (ex W-VHS) ・Removed the "oyko_link" section, as it was redundant (details/discussion: FooSoft/yomichan#1910 ) --- rev.3 --- ・All photos, diagrams, and mathematical figures are now supported! (see せいき‐きょくせん 【正規曲線】for an example of the math figs) ・Add tooltips to see the path + filename of gaiji/images ・Expand and improve gaiji mapping (thanks epistularum) ・Use relative font-size instead of absolute for sub/superscript (thanks stephenmk) --- Upcoming (maybe...) --- ・Detect kanji forms mentioned in the body of the entry (ex「波乱」とも書く in は‐らん 【波瀾】) ・Functional xrefs? ( FooSoft/yomichan#2089 ) ・Further expand gaiji map ・etc absurdly obscure/minor formatting improvements |
ハイブリッド新辞林 | 140879 | Revision: shinjirin |
新選国語辞典 第十版 | 127191 | Revision: shinsenkoku10;2023-08-07 URL: https://www.shogakukan.co.jp/books/09501409 Attribution: © Shogakukan Inc. 2022 Description: Add custom CSS for enhanced formatting li[data-dictionary^='新選国語辞典'] th, span[data-sc-shinsenkoku10='warichu'] { white-space: nowrap; } span[data-sc-shinsenkoku10='red'] { color: #e5007f; } |
新明解国語辞典 第八版 | 100593 | Revision: smk8;2023-07-09 Attribution: © Sanseido Co., LTD. 2020 |
三省堂国語辞典 第八版 | 93091 | Revision: sankoku8;2023-07-19 Attribution: © Sanseido Co., LTD. 2021 |
旺文社国語辞典 第十一版 | 92217 | Author: irhello, shoui Revision: OUKOKU11_1.6 URL: https://learnjapanese.moe Attribution: Ⓒ 2013 旺文社 株式会社 https://www.obunsha.co.jp/pr/oukoku/ Description: <================================ 的確さ、わかりやすさを追求! 王道の国語辞典をYomichanで検索! <================================ |
現代国語例解辞典 第五版 | 86213 | Author: DAnon Revision: genkokr5;2024-02-16 URL: https://www.shogakukan.co.jp/books/09501035 Attribution: © Shogakukan Inc. 2016 Description: 〈 書籍の内容 〉 さらに親しみやすく、さらにわかりやすく。 『現代国語例解辞典』は、1985年の初版刊行以来、「親しみやすく、わかりやすい」と高校生から社会人まで幅広い層から大きな支持を得てきました。とくに教育現場では「国語教諭からもっとも選ばれる辞典」として定評があり、生涯学習辞典としても愛され続けています。 第五版は、国語辞典としては初めて、国立国語研究所の日本語コーパスを全面的に活用して改訂。コーパスを活用することで、『現代』の日本語(『国語』)がどのように使われているかという実状を豊富な『例』を挙げながら『解』することが可能になった結果、さらに親しみやすく、さらにわかりやすく、そして、さらに使いやすく生まれ変わりました。 【改訂のポイント】 ■コーパスを活用した255のコラム(345語)を収録 ■見出し語や表記欄にもコーパスを活用 ■現代風の語釈と用例で、より身近に、より親しみやすく ■「結びつきの強い語(コロケーション)」欄で日本語に幅を ■「類語対比表」で日本語を豊かなものに ■日常生活に必須の7万1,000語を収録 ■本文デザインを一新、現代風にアレンジ ■コーパスについて理解を深めるための、専門家による「コーパス」解説 〈 編集者からのおすすめ情報 〉 国立国語研究所の日本語コーパスを活用した厳選コラムは、読むだけで楽しく、読むだけで幅広い知識が身につきます。【テーマ例】「空揚げ」と書くか「唐揚げ」と書くか?/「○○を組む」の○○に入る言葉で最も多いのは?/「被害を被る」「違和感を感じる」は間違いか?など255のコラムを収録しています。 |
岩波国語辞典 第八版 | 76625 | Revision: iwakoku8.2023.04.08.0 |
KO字源 | 73401 | Revision: KO字源 |
明鏡国語辞典 第二版 | 73068 | Revision: meikyo2.2023.07.22.0 Description: ▼ non-jōyō kanji ▽ jōyō kanji used with a non-standard reading |
全訳漢辞海 | 60839 | Revision: 全訳漢辞海 |
例解学習国語辞典 第十一版 | 56840 | Revision: RGKo11 2024/02/10 Attribution: © 小学館 |
実用日本語表現辞典 | 55379 | Revision: jitsuyou;2023-08-15 URL: http://www.practical-japanese.com/ Description: Added conjugation |
漢検漢字辞典 第二版 | 50974 | Author: dictionary anonymous Revision: kankenkj2;2024-01-15 Attribution: © The Japan Kanji Aptitude Testing Foundation |
漢字源 | 48085 | Revision: kanjigen1 |
weblio古語辞典 | 47958 | Revision: Meikyou1 |
国語辞典オンライン | 44970 | Revision: jitenon-kokugo;2023-05-13 URL: https://kokugo.jitenon.jp/ Attribution: © 2014-2023 国語辞典オンライン |
漢検 漢字辞典 | 40549 | Revision: ;_; |
新語時事用語辞典 | 18294 | Revision: shingojijiyougojiten;2023-08-14 URL: http://www.breaking-news-words.com/ Description: 新聞やテレビで話題に上った、新語および時事的なキーワードを解説しています。 Added conjugation |
使い方の分かる 類語例解辞典 | 17350 | Author: 小学館辞典編集部 Revision: tsukaikatanowakaru-2023-08-09 URL: https://dictionary.goo.ne.jp/thsrs/ Attribution: 使い方の分かる 類語例解辞典 Description: Scraped from dictionary.goo.ne.jp 2023-08-09 |
故事ことわざの辞典 | 15577 | Author: Thermospore Revision: kotowaza1 Description: Generated using the current version of Yomichan Import as of March 7th, 2021, then hex edited to remove excessive whitespace |
対義語辞典オンライン | 13965 | Revision: taigigo_240207 Attribution: taigigo.jitenon.jp |
類語辞典オンライン | 12151 | Revision: ruigo_240209 Attribution: ruigo.jitenon.jp |
故事・ことわざ・慣用句オンライン | 8513 | Revision: jitenon-kotowaza;2023-05-15 URL: https://kotowaza.jitenon.jp/ Attribution: © 2014-2023 故事・ことわざ・慣用句辞典オンライン |
漢字でGo! [2024-03-04] | 7866 | Author: Marv Revision: 2024-03-04 URL: https://github.com/MarvNC/kanjidego-yomitan-anki Attribution: https://formidi.github.io/KanzideGoFAQ/ https://w.atwiki.jp/kanjidego/ Description: From the Kanji de Go! unofficial wiki. Built with https://github.com/MarvNC/yomichan-dict-builder |
四字熟語辞典オンライン | 7782 | Revision: jitenon-yoji;2023-05-14 URL: https://yoji.jitenon.jp/ Attribution: © 2012-2023 四字熟語辞典オンライン |
学研 四字熟語辞典 | 5484 | Author: ッツ Revision: gakken_yojijukugo;2021-07-12 URL: https://dictionary.goo.ne.jp Attribution: 編集:学研 © Gakken https://hon.gakken.jp Description: 難解な四字熟語も理解できるように、また、手軽に調べられるように、四字熟語を広く捉え約2700項目を収録。文学作品の用例が豊富で、注記や類義語、対義語も充実。検定試験やクロスワードにも使えます。 |
日本語俗語辞書 | 4354 | Author: Kartoffel Revision: 1 Attribution: http://zokugo-dict.com/ Description: I'll only say anything in the presence of my advocate |
新明解四字熟語辞典 | 4194 | Author: ッツ Revision: shinmeikai_yojijukugo;2021-07-12 URL: https://dictionary.goo.ne.jp Attribution: 編集:三省堂 © SANSEIDO Co. https://dictionary.sanseido-publ.co.jp Description: 業界最大語数を誇る「新明解四字熟語辞典」より厳選。座右の銘や新年の抱負に使える四字熟語約2000項目を収録しています。就活のエントリーシートやスピーチなど、日常生活のさまざまな場面で役立ちます。 |
YOJI-JUKUGO | 4017 | Revision: YOJI-JUKUGO |
全国方言辞典 | 3738 | Author: goo Revision: zenkokuhougenjiten-2023-08-12 URL: https://dictionary.goo.ne.jp/dialect/ Attribution: 全国方言辞典 Description: Scraped from dictionary.goo.ne.jp 2023-08-12 |
語源由来辞典 | 2795 | Revision: Gogen |
福日木健二字熟語 | 2306 | Revision: 福日木健二字熟語 |
surasura 擬声語 | 1422 | Author: surasura, Marv Revision: surasura_2023-03-22T01:10:57.302Z URL: http://sura-sura.com/ Attribution: surasura Description: Onomatopoeia info from http://sura-sura.com/ Parsed/converted by https://github.com/MarvNC/yomichan-dictionaries |
数え方辞典オンライン | 1312 | Revision: count_240213 Attribution: count.jitenon.jp |
漢字ペディア同訓異義 | 966 | Revision: kanjipedia-doukunigi;2023-08-28 URL: https://www.kanjipedia.jp/sakuin/doukunigi/ Attribution: © 公益財団法人 日本漢字能力検定協会 Description: Scraped from kanjipedia 2023-08-28 |
複合語起源 | 222 | Author: 名無し, 名無し, seanblue, Marv Revision: 複合語起源_2022-08-26T22:38:51.046Z URL: https://github.com/MarvNC/yomichan-dictionaries Description: Sources: https://jbbs.shitaraba.net/bbs/read.cgi/study/10958/1299762655 https://academy6.5ch.net/test/read.cgi/gengo/1228873581 https://community.wanikani.com/t/special-kanji-words-derived-from-other-words/35655 Created with https://github.com/MarvNC/yomichan-dictionaries |
Title | Entry Count | Information |
---|---|---|
毎日のんびり日本語教師 | 1479 | Author: nihongobongo Revision: nihongo_no_sensei_v_1.03 ;2022-04-30;embedded urls, p of speech indicators(N5-N0) URL: https://nihongonosensei.net/?page_id=10246 Attribution: nihongo_no_sensei Description: term bank 1 contains N1, bank 2 N2, etc...中国の大学で日本語教師をしています。日本語教育能力検定試験の解説、対策講座、中国での生活や授業、日本語の文法の説明をしています。 |
絵でわかる日本語 | 1248 | Author: nihongobongo Revision: edewakaru_v1.03; 2022-09-01 URL: https://github.com/aiko-tanaka/Grammar-Dictionaries/ Attribution: https://www.edewakaru.com/archives/cat_179055.html Description: 日本語文法・自動詞他動詞・口語形・間違えやすい日本語・擬音語・擬態語などを「絵」で説明します。日本語を勉強している人のためのブログです. |
どんなときどう使う 日本語表現文型辞典 | 1082 | Author: nihongobongo Revision: donna_v1.04;2022-04-30(completed arrow internal links) URL: https://itazuraneko.neocities.org/grammar/donnatoki.html Attribution: itazuraneko Description: A well regarded grammar reference covering grammar points through N5 to N1 of the 日本語能力試験. |
JLPT文法解説まとめ | 628 | Author: nihongobongo Revision: nihongo_kyoshi_v1.03; 2022-05-27; p.o.s. info URL: https://nihongokyoshi-net.com/jlpt-grammars/ Attribution: 日本語NET Description: このページでは、JLPTに登場する文型を紹介しています。JLPTのレベル毎に50音順で並べています。 how to use:https://github.com/aiko-tanaka/Grammar-Dictionaries/ |
日本語文法辞典(全集) | 535 | Author: nihongobongo Revision: DOJG_v1.01;2022-04-30;better formatting Attribution: DOJG Description: DOJG-allVols |
Title | Entry Count | Information |
---|---|---|
漢字林 | 46210 | Revision: 漢字林 |
漢字辞典オンライン | 27693 | Revision: jitenon-kanji;2023-08-17 URL: https://kanji.jitenon.jp/ Attribution: © 2014-2023 漢字辞典オンライン |
Wiktionary漢字 | 18122 | Author: Wiktionary, Wikimedia Foundation, Marv Revision: Wiktionary漢字 2022-09-11T06:04:07.166Z URL: https://ja.wiktionary.org/wiki/%E3%82%AB%E3%83%86%E3%82%B4%E3%83%AA:%E6%BC%A2%E5%AD%97 Attribution: JA Wiktionary Description: Kanji data from ja.wiktionary.org. Parsed/converted by https://github.com/MarvNC/yomichan-dictionaries |
KANJIDIC | 10383 | Author: yomitan-import Revision: kanjidic2 URL: https://github.com/themoeway/yomitan-import Attribution: This publication has included material from the JMdict (EDICT, etc.) dictionary files in accordance with the licence provisions of the Electronic Dictionaries Research Group. See http://www.edrdg.org/ |
TheKanjiMap Kanji Radicals/Composition | 6911 | Author: thekanjimap, Marv Revision: thekanjimap_2023-02-04T03:44:35.926Z URL: https://thekanjimap.com Attribution: thekanjimap Description: Radical entries and kanji decomposition/compositions from thekanjimap.com. Created with https://github.com/MarvNC/yomichan-dictionaries |
JPDB Kanji | 6494 | Author: jpdb, Marv Revision: jpdb_kanji_2022-08-26T22:38:14.736Z URL: https://jpdb.io Attribution: jpdb Description: Kanji data from JPDB Created with https://github.com/MarvNC/yomichan-dictionaries |
漢字ペディア | 5635 | Revision: 漢字ペディア |
mozc Kanji Variants | 1317 | Author: Google, Marv Revision: mozc_2022-08-26T22:38:27.927Z URL: https://github.com/google/mozc Attribution: Google Description: Data about kanji variants from Google's Japanese IME, mozc. Created with https://github.com/MarvNC/yomichan-dictionaries |
jitai | 1174 | Author: epistularum, Marv Revision: jitai_2022-08-26T22:37:55.378Z URL: https://github.com/epistularum/jitai Description: Data about 新字体/旧字体 and 標準字体/許容字体 in comparison to each other. Created with https://github.com/MarvNC/yomichan-dictionaries |
Title | Entry Count | Information |
---|---|---|
Wikipedia Kanji | 20932 | Author: scriptin, Marv Revision: kanjiFrequency1 URL: https://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8 Attribution: JA Wikipedia Description: Rank-based kanji frequency data from a May 2015 dump of Japanese Wikipedia. Data from https://github.com/scriptin/kanji-frequency Modified by https://github.com/MarvNC/yomichan-dictionaries |
青空文庫漢字 | 8488 | Author: vtrm, Marv Revision: aozoraBunko_2022-08-26T05:49:00.968Z URL: https://www.aozora.gr.jp/ Attribution: 青空文庫 Description: Rank-based kanji frequency data from the Aozora Bunko Data from https://vtrm.net/japanese/kanji-jukugo-frequency/en Created with https://github.com/MarvNC/yomichan-dictionaries |
JPDB Kanji Freq | 6494 | Author: jpdb, Marv Revision: jpdb_kanji_2022-08-26T22:38:10.913Z URL: https://jpdb.io Attribution: jpdb Description: Rank-based kanji frequency data from JPDB Created with https://github.com/MarvNC/yomichan-dictionaries |
Innocent Corpus Kanji | 6430 | Author: cb4960, Marv Revision: kanjiFrequency1 URL: https://web.archive.org/web/20190309073023/https://forum.koohii.com/thread-9459.html#pid168613 Attribution: Innocent Corpus Novels Description: Rank-based kanji frequency data from the innocent corpus Modified by https://github.com/MarvNC/yomichan-dictionaries |
Title | Entry Count | Information |
---|---|---|
大辞林第四版 | 152193 | Author: コツ Revision: pitch_1.0.1.1 URL: https://kotu.io Description: For accurate pronunciation (vowel elongation etc..) and comparisons across multiple modern and historical dictionaries visit the linked URL. |
大辞泉 | 88089 | Author: コツ Revision: pitch_1.0.0.1 URL: https://kotu.io Description: For accurate pronunciation (vowel elongation etc..) and comparisons across multiple modern and historical dictionaries visit the linked URL. |
三省堂国語辞典第八番 | 77630 | Author: コツ Revision: pitch_1.0.1.1 URL: https://kotu.io Description: For accurate pronunciation (vowel elongation etc..) and comparisons across multiple modern and historical dictionaries visit the linked URL. |
新明解第八版 | 75978 | Author: コツ Revision: pitch_1.0.2.1 URL: https://kotu.io Description: For accurate pronunciation (vowel elongation etc..) and comparisons across multiple modern and historical dictionaries visit the linked URL. |
NHK | 73100 | Author: コツ Revision: pitch_1.0.1.1 URL: https://kotu.io Description: For accurate pronunciation (vowel elongation etc..) and comparisons across multiple modern and historical dictionaries visit the linked URL. |
Title | Entry Count | Information |
---|---|---|
Cifu Spoken | 51798 | Author: Regine Lai, Grégoire Winterstein, Marv Revision: 2023-12-21 URL: https://github.com/MarvNC/yomichan-dictionaries Attribution: Lai, Regine and Winterstein, Grégoire (2020) "Cifu: a Frequency Lexicon of Hong Kong Cantonese", in Proceedings of The 12th Language Resources and Evaluation Conference, Marseille: European Language Resources Association, p. 3062--3070. Description: Cantonese Frequency List from Cifu: https://github.com/gwinterstein/Cifu Spoken data from HKCanCor (Luke and Wong, 2015), HKCAC (Leung and Law, 2001), CantoMap (Lai and Winterstein, 2019) Converted by Marv |
Cifu Written | 51798 | Author: Regine Lai, Grégoire Winterstein, Marv Revision: 2023-12-21 URL: https://github.com/MarvNC/yomichan-dictionaries Attribution: Lai, Regine and Winterstein, Grégoire (2020) "Cifu: a Frequency Lexicon of Hong Kong Cantonese", in Proceedings of The 12th Language Resources and Evaluation Conference, Marseille: European Language Resources Association, p. 3062--3070. Description: Cantonese Frequency List from Cifu: https://github.com/gwinterstein/Cifu Written data from 3,841 chapters of amateur novels from the website https://www.shikoto.com/ Converted by Marv |
Words.hk Frequency | 41174 | Author: Marv Revision: 1.0 URL: https://github.com/MarvNC/wordshk-yomitan Attribution: Words.hk & contributers (https://words.hk) See license at https://words.hk/base/hoifong/ Description: Converted from the free Words.hk dictionary found at https://words.hk/. Converted using https://github.com/MarvNC/yomichan-dict-builder |
Title | Entry Count | Information |
---|---|---|
CE Wiktionary | 142685 | Revision: 20240412 |
Canto CEDICT | 105347 | Revision: 20240412 |
CantoDict | 86570 | Author: CantoDict contributors, Marv Revision: cantodict_2023-04-25T22:55:55.715Z URL: http://www.cantonese.sheik.co.uk/ Attribution: CantoDict contributors Description: CantoDict was a Cantonese-English dictionary created and maintained by public contributors. It was abandoned, but the data was archived thanks to awong-dev at https://github.com/awong-dev/cantodict-archive. Created with https://github.com/MarvNC/yomichan-dictionaries |
Words.hk 粵典 [2024-02-03] | 60129 | Author: Marv Revision: 1.0.0 URL: https://github.com/MarvNC/wordshk-yomitan Attribution: Words.hk & contributers (https://words.hk) See license at https://words.hk/base/hoifong/ Description: Converted from the free Words.hk dictionary found at https://words.hk/. This export contains 52457 entries. Converted using https://github.com/MarvNC/yomichan-dict-builder |
Words.hk C-E FS | 50064 | Revision: 20240412 |
Words.hk C-C FS | 50061 | Revision: 20240412 |
CC-Canto | 34335 | Revision: 20240412 |
Title | Entry Count | Information |
---|---|---|
Words.hk 粵典 漢字 [2024-02-10] | 6638 | Author: Marv Revision: 1.0.0 URL: https://github.com/MarvNC/wordshk-yomitan Attribution: Words.hk & contributers (https://words.hk) See license at https://words.hk/base/hoifong/ Description: Converted from the free Words.hk dictionary found at https://words.hk/. Converted using https://github.com/MarvNC/yomichan-dict-builder |
Title | Entry Count | Information |
---|---|---|
BLCUmixed | 98089 | Author: Beijing Language and Culture University compiled the data, BearXiong converted it for Pleco, nadavspi and Michel converted it for yomichan Revision: zhfreq_mixed_2023-06-20 URL: https://www.chinese-forums.com/forums/topic/56816-sharing-a-pleco-word-frequency-user-dictionary/ Attribution: Beijing Language and Culture University Description: This frequency list is taken from BearXiong's post on chinese-forums.com. The data itself comes from the 15 billion character corpus compiled by the Beijing Language and Culture university. |
BLCUsci | 92779 | Author: Beijing Language and Culture University compiled the data, BearXiong converted it for Pleco, nadavspi and Michel converted it for yomichan Revision: zhfreq_sci_2023-06-20 URL: https://www.chinese-forums.com/forums/topic/56816-sharing-a-pleco-word-frequency-user-dictionary/ Attribution: Beijing Language and Culture University Description: This frequency list is taken from BearXiong's post on chinese-forums.com. The data itself comes from the 15 billion character corpus compiled by the Beijing Language and Culture university. |
BLCUcoll | 91797 | Author: Beijing Language and Culture University compiled the data, BearXiong converted it for Pleco, nadavspi and Michel converted it for yomichan Revision: zhfreq_coll_2023-06-20 URL: https://www.chinese-forums.com/forums/topic/56816-sharing-a-pleco-word-frequency-user-dictionary/ Attribution: Beijing Language and Culture University Description: This frequency list is taken from BearXiong's post on chinese-forums.com. The data itself comes from the 15 billion character corpus compiled by the Beijing Language and Culture university. |
BLCUnews | 91690 | Author: Beijing Language and Culture University compiled the data, BearXiong converted it for Pleco, nadavspi and Michel converted it for yomichan Revision: zhfreq_news_2023-06-20 URL: https://www.chinese-forums.com/forums/topic/56816-sharing-a-pleco-word-frequency-user-dictionary/ Attribution: Beijing Language and Culture University Description: This frequency list is taken from BearXiong's post on chinese-forums.com. The data itself comes from the 15 billion character corpus compiled by the Beijing Language and Culture university. |
BLCUlit | 90580 | Author: Beijing Language and Culture University compiled the data, BearXiong converted it for Pleco, nadavspi and Michel converted it for yomichan Revision: zhfreq_lit_2023-06-20 URL: https://www.chinese-forums.com/forums/topic/56816-sharing-a-pleco-word-frequency-user-dictionary/ Attribution: Beijing Language and Culture University Description: This frequency list is taken from BearXiong's post on chinese-forums.com. The data itself comes from the 15 billion character corpus compiled by the Beijing Language and Culture university. |
SUBTLEX-CH | 62096 | Author: University of Ghent compiled the data, BearXiong converted it for Pleco, nadavspi and Michel converted it for yomichan Revision: zhfreq_subs_2023-06-20 URL: https://www.chinese-forums.com/forums/topic/56816-sharing-a-pleco-word-frequency-user-dictionary/ Attribution: University of Ghent Description: This frequency list is taken from BearXiong's post on chinese-forums.com. The subtitles category is based on SUBTLEX, a frequency list compiled by the university of Ghent, Belgium. |
Sinica | 11549 | Author: Sinica, Michel Revision: Sinica_2024-04-16 URL: https://elearning.ling.sinica.edu.tw/eng_jindai.html Attribution: Institute of Linguistics, Academia Sinica Description: This frequency list is taken from the sinica.edu.tw website and gives the frequencies of traditional characters in a Taiwanese corpus. Note that the website is from 2005, the encoding was obsolete and some characters were not rendered. It was not possible to retrieve all entries for the rarest words either, as only 300 of those occurring only once could be shown on the webpage. As a result, the full list has little more than 10 000 unique entries. |
HSK | 11107 | Author: Chinese Ministry of Education, Andy Burke, Michel Revision: HSK_2023-06-20 URL: https://github.com/andycburke/HSK-3.0-Word-List Attribution: Chinese Ministry of Education Description: This frequency list is based on the new HSK word list (HSK 3.0, 2021). It is taken from Andy Burke's github post, which is itself taken from the Chinese Ministry of Education's original pdf. Levels 7 to 9 are not delineated |
Title | Entry Count | Information |
---|---|---|
Wenlin ABC | 207289 | Author: Wenlin Institute, rduwjjnh Revision: Wenlin_2024-04-14 URL: https://wenlin.co/wow/Project:Ci Attribution: Wenlin Institute Description: Published in July 2003 and revised through 2005, the Wenlin ABC Chinese-English Comprehensive Dictionary was produced by the Wenlin Institude in cooperation with the ABC Chinese Dictionary Series Project at the University of Hawaii. It contains over 196,000 entries. |
CC-CEDICT [2023-12-20] | 198702 | Author: MDBG, CC-CEDICT, Marv Revision: 2023-12-20 URL: https://github.com/MarvNC/cc-cedict-yomitan Attribution: https://cc-cedict.org/wiki/ Thanks go out to everyone who submitted new words or corrections. Special thanks go out to the CC-CEDICT editor team, who spend many hours doing research to maintain a high quality level: goldyn_chyld - Matic Kavcic richwarm - Richard Warmington vermillion - Julien Baley ycandau - Yves Candau feilipu and the editors who wish to remain anonymous Special thanks to: Craig Brelsford, for his extensive list of bird names Erik Peterson, for his work as the editor of CEDICT Paul Andrew Denisowski, the original creator of CEDICT Description: CC-CEDICT is a continuation of the CEDICT project started by Paul Denisowski in 1997 with the aim to provide a complete downloadable Chinese to English dictionary with pronunciation in pinyin for the Chinese characters. This dictionary for Yomitan was converted from the data available at https://www.mdbg.net/chinese/dictionary?page=cc-cedict using https://github.com/MarvNC/cc-cedict-yomitan and https://github.com/MarvNC/yomichan-dict-builder. |
CEDICT | 193931 | Revision: cc_cedict_14_3_2021 |
中日大辞典 第二版 | 146381 | Revision: chuunichi1 |
Oxford | 74663 | Revision: Oxford_2024-04-17 Attribution: Oxford University Press Description: This is one of the Oxford English-Chinese dictionaries. |
白水社中国語辞典 | 63858 | Revision: 白水社中国語辞典_1 |
DrEye | 30973 | Revision: DrEye_2024-04-13 Attribution: Dr.Eye Description: Original name: 譯典通英漢雙向字典. This seems to be one of the Dr.Eye Chinese-English dictionaries, a series of commercial dictionaries available in Taiwan. It comes with example sentences and their translations. |
500idioms | 869 | Author: The original authors created the content; Ooodman scraped the data on his github; Michel converted it to yomitan format Revision: 500idioms_2024-04-13 URL: https://doi.org/10.4324/9780203839140 Attribution: Jiao, L., Kubler, C. C., & Zhang, W. Description: A glossary of 500 chengyu with two example sentences for each idiom |
Title | Entry Count | Information |
---|---|---|
ZH Wikipedia [2022-12-01] | 1249877 | Author: Wikipedians, DBPedia, Marv Revision: wikipedia_2023-12-20T01:18:42.692Z URL: https://github.com/MarvNC/wikipedia-yomitan Attribution: https://zh.wikipedia.org/ Description: Wikipedia short abstracts from the DBPedia dataset available at https://databus.dbpedia.org/dbpedia/text/short-abstracts. Recommended custom CSS: div.gloss-sc-div[data-sc-wikipedia=term-specifier] { color: #e5007f; } |
漢語大詞典 | 550544 | Revision: 漢語大詞典_1 |
MoEdict | 266956 | Revision: MoEdict-2024_04-19 Attribution: Tang Feng Description: 萌典 (mengdian) is a digital Chinese dictionary developed by Taiwanese free software programmer Tang Feng. It is one of the projects of Taiwan's open source community g0v Zero Hour Government. As a digital Chinese dictionary, Mengdian not only contains 160,000 entries in Mandarin, but also contains 20,000 entries in Taiwanese Hokkien, 14,000 entries in Taiwanese Hakka, and provides comparisons with English, French and German. The website author Tang Feng released it into the public domain under the Creative Commons CC 0 license. (Information taken from Wikipedia; the name MoEdict and abbreviation for 'Ministery of Education's dictionary') |
兩岸詞典 | 163091 | Revision: 兩岸詞典_1 |
辭源 | 91538 | Revision: 辭源 |
牛津英汉汉英词典 | 74663 | Revision: lix 2 |
XiandaiGuifan | 72841 | Revision: XiandaiGuifan_2024-04-17 Attribution: 外语教学与研究出版社 (Foreign Language Teaching and Research Press) Description: Xiandai Hanyu Guifan Cidian (现代汉语规范词典) is a dictionary of Standard Chinese created as part of a proposal in the Eighth Five-year Plan of China. It is similar to Xiandai Hanyu Cidian, but with notable divergences. The third edition has entries for 12,000 characters and 72,000 words, with over 80,000 example usages. (Wikipedia) |
Xiandai7 | 70861 | Revision: Xiandai_2024-04-20 Attribution: 商务印书馆有限公司 (Commercial Press);外语教学与研究出版社 (Foreign Language Teaching and Research Press) Description: Xiandai Hanyu Cidian (现代汉语词典) also known as Contemporary Chinese Dictionary is an important one-volume dictionary of Standard Mandarin Chinese published by the Commercial Press, now into its 7th (2016) edition. It was the first People's Republic of China dictionary to be arranged according to Hanyu Pinyin, the phonetic standard for Standard Mandarin Chinese, with explanatory notes in simplified Chinese. It contains over 70 000 entries. (Information taken from Wikipedia) |
Wunan | 59935 | Revision: Wunan_2024-04-13 Attribution: Wu-Nan Book Inc. Description: The Dictionary of Mandarin (五南国语活用辞典) is a dictionary published by the Wunan Book Publishing Company in Taiwan for readers with an intermediate level of language proficiency. (Wikipedia). Please note that the pinyin was derived from the characters, as lix's file only had zhuyin. |
譯典通英漢雙向字典 | 30973 | Revision: lix 2 |
Title | Entry Count | Information |
---|---|---|
ZH Wiktionary Hanzi | 97641 | Author: Wiktionary, Wikimedia Foundation, Marv Revision: ZH_Wikt_Hanzi2023-03-05T21:01:41.039Z URL: https://zh.wiktionary.org/wiki/Category:%E6%BC%A2%E5%AD%97%E5%AD%97%E5%85%83 Attribution: ZH Wiktionary Description: Hanzi data scraped from zh.wiktionary.org Parsed/converted by https://github.com/MarvNC/yomichan-dictionaries |
康熙字典 | 46836 | Revision: 康熙字典 |
CC-CEDICT Hanzi [2023-12-20] | 17740 | Author: MDBG, CC-CEDICT, Marv Revision: 2023-12-20 URL: https://github.com/MarvNC/cc-cedict-yomitan Attribution: https://cc-cedict.org/wiki/ Thanks go out to everyone who submitted new words or corrections. Special thanks go out to the CC-CEDICT editor team, who spend many hours doing research to maintain a high quality level: goldyn_chyld - Matic Kavcic richwarm - Richard Warmington vermillion - Julien Baley ycandau - Yves Candau feilipu and the editors who wish to remain anonymous Special thanks to: Craig Brelsford, for his extensive list of bird names Erik Peterson, for his work as the editor of CEDICT Paul Andrew Denisowski, the original creator of CEDICT Description: CC-CEDICT is a continuation of the CEDICT project started by Paul Denisowski in 1997 with the aim to provide a complete downloadable Chinese to English dictionary with pronunciation in pinyin for the Chinese characters. This dictionary for Yomitan was converted from the data available at https://www.mdbg.net/chinese/dictionary?page=cc-cedict using https://github.com/MarvNC/cc-cedict-yomitan and https://github.com/MarvNC/yomichan-dict-builder. |
EDHCC | 6371 | Author: The original authors created the content; lxs602 made it available in dictionary format; Michel converted it for yomitan Revision: EDHCC_2024-04-15 Attribution: Lawrence J. Howell; Hikaru Morimoto Description: The Etymological Dictionary of Han Chinese Characters contains approximately 6000 entries explaining the connections between glyph and original meanings in Old Chinese. By Lawrence J. Howell, with Hikaru Morimoto. Compiled into mdx dictionary format by lxs602 https://github.com/lxs602/Chinese-Mandarin-Dictionaries. Converted to yomitan format by Michel |