Word cloud splits words on accent #74

AlexGibson12 · 2020-12-30T17:25:33Z

If I include for instance the word fettsäuren, the word cloud will generate the two words "fetts" and "uren". I saw someone elsewhere talking about a seperate word cloud library suggesting it could have something to do with the regex used to split words?

helios1138 · 2021-02-01T15:40:13Z

@AlexGibson12 this library doesn't handle tokenization, it's just rendering what you give it. To support Unicode characters in words, I use /[\p{L}']+/gu in my word splitting code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Word cloud splits words on accent #74

Word cloud splits words on accent #74

AlexGibson12 commented Dec 30, 2020

helios1138 commented Feb 1, 2021

Word cloud splits words on accent #74

Word cloud splits words on accent #74

Comments

AlexGibson12 commented Dec 30, 2020

helios1138 commented Feb 1, 2021