Double-spaced PDFs have trouble block-chunking

Currently lines are grouped into blocks if the distance between lines is less than some function of the average character height. This, in tests, works well in single-spaced documents. In double-spaced documents, each line gets its own block. There's an alternate method that does a histogram of the gap sizes and tries to find the ones that are significantly bigger than average, but this doesn't seem to work well for some single-spaced documents. Things could be improved here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Double-spaced PDFs have trouble block-chunking #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Double-spaced PDFs have trouble block-chunking #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions