Russian-Belarusian neural translator
The data is a part of my bachelor thesis about neural translation for the language pair Russian-Belarusian.
The repo consists of
-
429k aligned sentence pairs (under Data/AlignedData), split into 10 batches
-
chunks to align (under Data/ChunksToAlign)
-
Data/TabbedCorpusMiddleSent.txt is a sample of 65966 sentences, at max 80 characters each, and is handy to train a model only on a sample of data.
-
neural network code.
? The main source of the data (web-pages,..)
? How the data was collected
This is an open-source project, data can be used freely. Any reviews are much than welcome.
Author: Tsimafei Prakapenka