Currently lines are grouped into blocks if the distance between lines is less than some function of the average character height. This, in tests, works well in single-spaced documents. In double-spaced documents, each line gets its own block. There's an alternate method that does a histogram of the gap sizes and tries to find the ones that are significantly bigger than average, but this doesn't seem to work well for some single-spaced documents. Things could be improved here.