Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix/extend re replacement seq (#948)
This PR is an extension of #763, related to extending the `re_replacement_seq` regex. The new [NorwAI models](https://huggingface.co/NorwAI) use a tokenizer that has the token `�.`, which leads to the same error as was described in the previous issue #762. This PR extends the fix from #763 to deal with this case, as well as adding a unit test to test various tokenizers, and a comment describing why we need the prefix and suffix in the regex.
- Loading branch information