Why are some of the testdata files broken? The `Ã` char is some kind of example sign for a broken file. But there are others. For example [Welsh](https://github.com/pemistahl/lingua-rs/blob/6d68f6ba9d2b9a8b1cf3feabd6e2855e57ca8f75/language-models/cy/testdata/sentences.txt#L80) Welsh doesn't even have an `Ã` char.