Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing ITU OB source issues #25

Open
ronaldtse opened this issue Jul 17, 2019 · 3 comments
Open

Parsing ITU OB source issues #25

ronaldtse opened this issue Jul 17, 2019 · 3 comments

Comments

@ronaldtse
Copy link
Contributor

There are 3 kinds of ITU OB issues (so far as we know):

  1. OB issues that are in DOCX -- these should be automatically parsed into YAML files. (manual intervention, low)
  2. OB issues that are in DOC -- these need to be parsed out using Apache Tika, (manual intervention, medium)
  3. OB issues that are not machine-readable or images -- probably better to be entered manually. We can provide the itu-ob-editor for data entry personnel to enter them manually.
@strogonoff
Copy link
Contributor

strogonoff commented Jul 21, 2019

Mostly looks right, though you may’ve forgotten to mention issues before 567 which cannot be found on ITU.int at all. See #20. We are dealing with step (1) in milestone https://github.com/ituob/itu-ob-data/milestone/2.

@ronaldtse
Copy link
Contributor Author

Correct — we’ll get those old issues offline. They’re probably not digital 😉

@strogonoff
Copy link
Contributor

strogonoff commented Jul 21, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants