The repository describes RIGA team submission to MultiCoNER II.
- Create a new environment
python -m venv venv
- Install dependencies
pip install -r requirements.txt
- Now your environment is ready. Next step is get the data from MultiCoNER download page. Put the data in
datadirectory. - Convert the data using
parse_conll.pyscriptpython parse_conll.py --source_path {specify a path to dataset in CoNNL format} - Start gathering context using
get_context.pyscript. You'll need to specify your own API key and specifying the dataset split to use. You'll find aTODOcomments in the file for a help - On step 5. each context is collected separately for easier navigation and not querying the same sentences multiple times in case of error.
On this step you'll need to merge all of them into a single file. Usemerge_context.pyscript for this purpose. You'll also need to change the dataset split in order to merge contexts for all train/dev/test datasets. - The last step is NER model fine-tuning. You could run
python train.py --helpcommand to get all argument list. During the competition we used mainly eitherdistilbert-base-uncased(66M parameters) orxlm-roberta-largemodels (558M parameters).