The models were generated by the Training model for Patchwise Analysis of Music Document – HPC
job,
using the images provided in the training_data
folder. For processing a music document with staff lines and text,
only three models are necessary: one for music symbols, one for staff lines, and one for background (which will include the text).
We used the default values of the settings of this training job:
maximum number of samples per label = 15,000
epochs = 20
early stop = 15
patch height = 256
patch width = 256
batch size = 16
- We used
150 GB
for the memory. You should try to use the least amount of memory that allows the job to finish. Normally, you won't need more than150 GB
of memory for procesing a maximum of15k samples
per layer for3 layers
. We still need to experiment to find out the minimum amount of memory needed. - The training was completed in less than
10 hours
.
As indicated in the End-to-End OMR Documentation - Hints section:
The document-analysis classifier and trainer jobs are sensitive to the size of images. For music with staves, the distance between staff lines in pixels (staff size height) tends to be a predictor of how well it will perform. For instance, with the original CDN-Hsmu M2149.L4 images, the staff size height is 64 px. Values around this point may result in better classification and training results, but note that this is not an optimized measure.
Normally, for better classification results, the size of the images used for training data is modified to have a staff size height closer to 64 px. For the CH-E 611 (Einsiedeln) manuscript, no resizing was needed since the images were small enough.
The folios randomly selected for the training data are: 32r and 263v. These two folios contained enough information to get 150k samples per layer for all three layers (staff lines, neumes, and background). This is because there are around 15 staves per page and each staff has a high density of neumes. Therefore, no more folios were needed for training (comparte this to the 3 folios needed for Salzinnes and the 9 folios needed for MS 73—which has very few symbols and staves per page).
The set of images were combined into a big file including the two of them. The same was done to the set of layers generated in Pixel. The combined layers can be found in the training_data
folder. The combined images can be generated again by retrieving the original images from the IIIF Manifest and combining them into one file (in ascending order—32r and 263v) by using Image Magick as indicated in the tutorial section for Image Layering.