We annotated several thousand audio recordings with SDC data about i.e. the performer and the performance place.
We present an example of how to plan for adding SDC statements to a discreet category of files that all have a similar metadata structure, as is often the case with files from a GLAM upload.
The Swedish Performing Arts Agency (Musikverket) is a government agency promoting and preserving the musical heritage of our country. Wikimedia Sverige has been working with the agency – more specifically, its library and archives – for several years, by providing training about the Wikimedia projects, organizing edit-a-thons, providing Wikimedians in Residence and uploading collections of digitized photographs and audio files. We decided to include some of these audio files in our SDC project as they are an interesting example of a file type that is often overlooked on Wikimedia Commons. Most of the files that users upload and enrich with structured data are images. We were interested in how structured information about digitized music could be modeled and what benefits SDC could bring to this type of material.
The material included in this project consisted of about 3,500 wav files from Musikverket's collections that had been provided and uploaded to Commons as part of our 2020-2021 partnership. The recordings document folk music from various parts of Sweden and some other Nordic countries (Finland, the Faroes) and were originally collected by Matts Arnberg with the Swedish Radio in the mid-20th century. They are thus very culturally valuable, spanning across many different performers and localities, and were kindly licensed as CC0 by Musikverket to make sharing them as easy as possible.
We enriched the audio recordings from the Swedish Folk Music Archive with information that had been provided in the metadata files from Musikverket. When we first uploaded the files to Wikimedia Commons, the metadata files were used to populate the file information templates. Now they could also be used to add information such as the performer, the performance date and the performance place as SDC statements. In the following illustration, you can see where the music pieces were recorded – something that would have been much harder to create before. Now the places can be found with a SPARQL query.
The locations where the audio pieces from the Swedish Folk Music archive were recorded. Query.
The following table illustrates how the contents of the file description page can be parsed and converted into SDC statements. We found that when working with a large batch of files, such as in this case files from a GLAM partnership projects, it can be helpful to prepare such an overview beforehand. It gives you the chance to research the available properties and handle any data that is unclear or hard to parse. For example, in this case, some but not all of the performers were already linked to Wikidata items (performer = {{Q|Q53108512}}). If that was not the case (performer = Hulda Krook), we decided to input the strings as author name string (P2093) qualifiers to the performer (P175) property.