Skip to content

Commit

Permalink
Fix a typo
Browse files Browse the repository at this point in the history
  • Loading branch information
AngledLuffa committed Jan 10, 2025
1 parent 946102a commit 6988eb1
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion stanza/utils/datasets/prepare_tokenizer_treebank.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ def augment_arabic_padt(sents, ratio=0.05):
Reason seems to be that there are almost no examples of "text ." in the dataset.
This function augments the Arabic-PADT dataset with a few such examples.
TODO: it may very well be that a lot of tokeners have this problem.
TODO: it may very well be that a lot of tokenizers have this problem.
Also, there are a few examples in UD2.7 which are apparently
headlines where there is a ' . ' in the middle of the text.
Expand Down

0 comments on commit 6988eb1

Please sign in to comment.