Skip to content

Latest commit

 

History

History
117 lines (73 loc) · 5.84 KB

README.rst

File metadata and controls

117 lines (73 loc) · 5.84 KB

DCML Corpora

What is this?

This is a meta-repository containing all corpora published and curated by the Digital and Cognitive Musicology Lab Lausanne. Twelve corpora are publicly available; more are to follow.

ABC
The Annotated Beethoven Corpus containing L. v. Beethoven's string quartets.
beethoven_piano_sonatas
Ludwig van Beethoven - Piano Sonatas [DOI][ZIP]
corelli
Arcangelo Corelli's trio sonatas opp. 1, 3 and 4.
chopin_mazurkas
Frédéric Chopin - Mazurkas [DOI][ZIP]
debussy_suite_bergamasque
Claude Debussy - Suite Bergamasque [DOI][ZIP]
dvorak_silhouettes
Antonín Dvořák - Silhouettes [DOI][ZIP]
grieg_lyrical_pieces
Edvard Grieg - Lyric Pieces [DOI][ZIP]
liszt_pelerinage
Franz Liszt - Années de Pèlerinage [DOI][ZIP]
medtner_tales
Nikolai Medtner - Tales [DOI][ZIP]
mozart_piano_sonatas
All piano sonatas by W.A. Mozart.
schumann_kinderszenen
Robert Schumann - Kinderszenen [DOI][ZIP]
tchaikovsky_seasons
Pyotr Tchaikovsky - The Seasons [DOI][ZIP]

What do the corpora include?

At the heart of every subcorpus is a folder called MS3 containing a set of annotated music scores in the MuseScore file format .mscx. In order to display the files you need to download the data to your computer and open them with MuseScore 3. For example, the beginning of the file ABC/MS3/n08op59-2_01.mscx looks like this:

Beginning of Beethoven's 8th String Quartet op. 59/2

Beginning of Beethoven's 8th String Quartet op. 59/2 with harmony labels

In addition to the annotated scores in the MS3 folder, the following folders contain the same information in a tabular format:

  • notes: TSV files representing one note per row
  • measures: TSV files representing one measure per row
  • harmonies: TSV files representing one harmony label per row
  • chords: TSV files where each row represent a set of notes with the same onset and duration, appearin in the same notational layer. Columns represent various dynamics, articulation sings, staff texts, figured bass, etc.

The TSV files (tab-separated values) can be opened with any modern data processor, programming language, or spreadsheet, for example with LibreOffice Calc. They were created with the MuseScore parser ms3 which can be used to extract other information from MuseScore files, too, such as articulation, lyrics, or rests. Its documentation includes information on what the columns in the above-mentioned TSV files contain.

Harmony Labels

The harmonic analysis in the above example follows the DCML harmonic annotation standard. The labels were entered into the scores by professional music theorists.

Downloading the data

Download as Frictionless datapackage

Since the second half of 2023, all releases of DCML corpora are accompanied by frictionless datapackages. The datapackage contains the following files:

  • dcml_corpora.zip, a ZIP file containing one TSV file per facet, that corresponds to a concatenation of the TSV files in the respective folders of all corpora, that is * dcml_corpora.chords.tsv * dcml_corpora.expanded.tsv * dcml_corpora.measures.tsv * dcml_corpora.metadata.tsv (concatenation of a single file) * dcml_corpora.notes.tsv
  • dcml_corpora.datapackage.json, the package descriptor.

If one has the frictionless framework installed and downloaded both files, one can use the descriptor to validate the package using the command

`bash frictionless validate dcml_corpora.datapackage.json `

Clone this repo

This repository contains submodules. You can use this command to clone it

git clone --recurse-submodules -j8 https://github.com/DCMLab/dcml_corpora.git

Downloading manually

If you are unfamiliar with Git, you can download the corpora individually as ZIP files. Click on the respective folder above (e.g. ABC @ <commit>) and click on (the green button) Code -> Download ZIP.