A modified version of Jieba segmentator
- Equipped with Emoticon detection
- Data are trained with Sinica Corpus
- Using Brill Tagger
Emoticons will not be segmented as sequences of meaningless punctuations.
Results are more accurate when dealing with Traditional Chinese (F1-score = 0.91).
Training data are trained with Sinica Treebank, which raises the accuracy of POS tagging.