Skip to content

Latest commit

 

History

History
6 lines (4 loc) · 249 Bytes

README.md

File metadata and controls

6 lines (4 loc) · 249 Bytes

TokenizeAnything

A re-implementation of redpony/cdec's tokenize-anything.pl script in python

samples/ is a bunch of data pulled from Wikipedia in a bunch of languages.

tok/ holds the same data, as tokenized by the original tokenize-anything.pl.