Skip to content

Latest commit

 

History

History
11 lines (8 loc) · 503 Bytes

README.md

File metadata and controls

11 lines (8 loc) · 503 Bytes

Train data

For our experiments we created a huggingface dataset, using the datasets, where the data is structured like this:

src,trg
[SENTENCE1],[TRANSLATION_OF_SENTENCE1]

If you wish to use HuggingFace datasets too, they should have the same columns, and you need to set the flag "--hf_dataset_directory" with the directory of the dataset.

If you instead wish to use a csv file with those same columns, you set the flag "--train_data_file".