- WIDET data:
/data
- Additional train instances:
/data/additional_train
- Additional train instances that we mainly employed (<wug#123>):
/data/additional_train/tag
- Direct evidence:
de.txt
, Lexically indirect evidence:lexie.txt
, Syntactically Indirect evidence:synie.txt
- Direct evidence:
- Additional train instances adding morphology into a tag (<wug#321>s):
/data/additional_train/tag_w_morph
- Additional train instances adding morphology into a tag (wug):
/data/additional_train/wug
- Additional train instances that we mainly employed (<wug#123>):
- Evaluation instances:
- Evaluation instances that we mainly employed:
data/eval/tag
- Evaluation instances that we mainly employed:
- Tokenizer added <wug#\n>:
/data/tokenizer/wikipedia_vocab9k
- Pretrain data used in this work:
/data/pretrain
WIDET is distributed under a CC-BY license.