Skip to content
This repository has been archived by the owner on Feb 25, 2023. It is now read-only.

Various 広辞苑 bugs #28

Open
Thermospore opened this issue Mar 12, 2021 · 2 comments
Open

Various 広辞苑 bugs #28

Thermospore opened this issue Mar 12, 2021 · 2 comments

Comments

@Thermospore
Copy link

Thermospore commented Mar 12, 2021

  1. over a thousand entries with �� or as the headword
  2. some headwords need this boxed A thing removed
    image
  3. same bug as number 3 in issue Various 故事ことわざの辞典 bugs #27
    image
  4. there are a lot of broken looking entries with a ○ at the start of the headword
@Thermospore
Copy link
Author

I'm new to the EPWING format. Guessing number 1 is caused by those charming image fonts 🙂 I'm willing to help map them out. Looks like 広辞苑 has a shit ton though. Maybe bulk OCR, then manually confirm one by one?
image

@FooSoft
Copy link
Owner

FooSoft commented Mar 12, 2021

Ah yes, that would be the image fonts. The problem is they don't necessarily have to correspond to things you would find in fonts (most are normal characters, but there are random exceptions for symbols). The process of mapping them out often includes finding reasonable substitutions for glyphs that don't exist. Help mapping the missing ones would be much appreciated!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants