GitHub - paulhan93/nlp: natural language processing

Mapping text to MIDI tokens

Examples of our process for mapping tokenized text to corresponding MIDI tokens can be found in the notebook file tokenization_examples.ipynb

git clone https://github.com/wazenmai/MIDI-BERT.git
cd MIDI-BERT
pip install -r requirements.txt
cd MidiBERT/CP

Then copy the midi_glue.py and finetune_trainer.py files from our repository into the directory
Download the pre-trained model checkpoint from the MidiBERT-Piano repo
Now you can fine-tune MidiBERT-Piano on GLUE tasks:

python3 midi_glue.py --task='mrpc' --epochs=3 --ckpt='pretrain_model.ckpt' --lr=2e-5

We used the run_glue.py script provided by Hugging Face to fine-tune GroNLP/bert-base-dutch-cased on the GLUE tasks.
The Wordpiece tokenizer that we used for the English vocabulary is saved as a serialized JSON file on our repository as tokenizer.json and the vocabulary file is vocab.txt
The fine-tuning runs for our experiments are in run_finetune_on_bert-base-cased.sh, run_finetune_on_bert-base-cased_random_weights.sh, and run_finetune_on_bert-base-dutch-cased.sh

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
figures-tables		figures-tables
LICENSE		LICENSE
README.md		README.md
finetune_trainer.py		finetune_trainer.py
midi_glue.py		midi_glue.py
run_finetune_on_bert-base-cased.sh		run_finetune_on_bert-base-cased.sh
run_finetune_on_bert-base-cased_random_weights.sh		run_finetune_on_bert-base-cased_random_weights.sh
run_finetune_on_bert-base-dutch-cased.sh		run_finetune_on_bert-base-dutch-cased.sh
run_finetune_on_midi.sh		run_finetune_on_midi.sh
tokenization_examples.ipynb		tokenization_examples.ipynb
tokenizer.json		tokenizer.json
vocab.txt		vocab.txt