Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode word segmentation for command-line tool #244

Open
PanderMusubi opened this issue May 21, 2020 · 1 comment
Open

Unicode word segmentation for command-line tool #244

PanderMusubi opened this issue May 21, 2020 · 1 comment

Comments

@PanderMusubi
Copy link
Contributor

Please, offer Unicode word segmentation for command-line tool like Nuspell is doing. Only requires Boost Locale, which you already need when building with Nuspell provider.

@rrthomas
Copy link
Owner

rrthomas commented May 21, 2020

[Copied text from @PanderMusubi]

The Nuspell command-line tool offers Unicode segmentation of text to
words; see
https://github.com/nuspell/nuspell/blob/master/src/nuspell/main.cxx#L283

The result is much much better [t]han simply whitespace segmentation. You
can see the difference with this test
https://github.com/nuspell/misc-nuspell/tree/master/segmentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants