You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Enhancements
Enhance quote standardization tests with additional Unicode scenarios
Relax table segregation rule in chunking. Previously a Table element was always segregated into its own pre-chunk such that the Table appeared alone in a chunk or was split into multiple TableChunk elements, but never combined with Text-subtype elements. Allow table elements to be combined with other elements in the same chunk when space allows.
Compute chunk length based solely on element.text. Previously .metadata.text_as_html was also considered and since it is always longer that the text (due to HTML tag overhead) it was the effective length criterion. Remove text-as-html from the length calculation such that text-length is the sole criterion for sizing a chunk.
Features
Fixes
Fix ipv4 regex to correctly include up to three digit octets.