Commons:Koninklijke Bibliotheek/SDoC
Home | Overview of our media files | Our media donations | Our SDoC efforts | Machine access | Uptake, metrics and reuse | Case studies & stories | About KB / Contact | Our project pages on other Wikis | All pages |
THIS PAGE NEEDS UPDATING!!
Our SDoC efforts
[edit]Activities, projects, research etc. by the KB related to the Structured Data on Wikimedia Commons effort.
- All images related to the KB are collected in Category:Koninklijke Bibliotheek, Netherlands.
- Images from our media donations are grouped into Category:Media contributed by Koninklijke Bibliotheek (flat overview) and Category:Collections from Koninklijke Bibliotheek (file tree with subcategories)
Adding P180 Depicts tags to our media files
[edit]In the summer of 2019 we started doing some first small scale experiments on Stedenboek de Wit and our catchpenny prints to add P180 'Depicts' fields to selected images in those categories. In other words: we started exploring the added values of Wikidata based semantic tagging of things that can be seen (are depicted) in images in those categories. These Wikidata Depicts tags enable content based image searching (search for what can been seen in the image) in Commons.
You can see more about these first experiments in these presentations:
-
Academic heritage symposium, 26 September 2019, Utrecht, slide 60 onwards.
-
KNVI annual conference,14 November 2019, Amsterdam, slide 92 onwards
-
Dutch Digital Heritage Week, 26 November 2019, Leiden, slide 19 onwards.
-
KB internal knowledge sharing event, 23 January 2020, The Hague, slide 38 onwards.
Early 2020, during the start of the COVID pandemic and lockdown, these first experiments were scaled up to include more KB objects, such as Atlas Ortelius 1571, Atlas van der Hagen, Visboeck Coenen, Armorial de Beyeren or Admirandorum quadruplex spectaculum. For this we involved the help of KB employees, for whom we created a step-by-step instruction (in Dutch) on how to add P180 tags to KB images. One employee did a huge job by adding more than 34.000 Depicts-tags to 1000s of KB images between March 2020 and July 2022.
Tools we use(d) for P180 tagging:
- Direct tagging in the Commons image interface (tab 'Structured data')
- ISA tool, with dedicated campaigns for Stedenboek de Wit and our catchpenny prints
- Bulk tagging via sdc_tool.js. This user script lets you quickly add statements for Structured Data on Commons (SDC) to (selected) files on galleries, category pages, and search results.
- Bulk tagging via the AC/DC tool, a Wikimedia Commons gadget to add a collection of structured data statements (such as depicts (P180) to a set of files (such as a Category).
- OpenRefine 3.6+, a powerful and flexible tool to add structured data to Wikimedia Commons files in batch (from OpenRefine 3.6).
- Self-built tools and scripts, see below
Adding other structured data fields to our media files
[edit]In addition to the P180 Depicts tags, we are also continuously adding other structured data to our media files. The most relevant/important ones are:
- collection (P195) = KB National Library of the Netherlands (Q1526131) -- See this query for images with this statement.
- copyright status (P6216) = public domain (Q19652) -- See this query for images with this statement.
- source of file (P7482) -- See this query for images with this statement.
- digital representation of (P6243) -- See this query for images with this statement.
Furthermore, typical properties we are adding include:
- creator (P170) -- Query
- main subject (P921) -- Query
- copyright license (P275) -- Query
- media type (P1163) -- Query
Handy way to add collection (P195) = KB National Library of the Netherlands (Q1526131) in bulk to files in a category is to use the QS-box in the Petscan interface https://petscan.wmflabs.org
Image positions / bounding boxes
[edit]We did some small scale experiments with adding of Hours of Philip of Burgundy - KW76F - folio 283r - Heaven - All Saints - Jean Le Tavernier.jpg bounding boxes to KB images using the Wikidata Image Positions tool.
Monitoring the sdoc in our media files
[edit]Commons native search
[edit]Example queries: Images in Stedenboek de Wit depicting 'De Burcht' in Leiden or cities on the Zuiderzee
Hay's structured search
[edit]- KB collection images: [Hay's structured search] + [Commons native search]
- Images with things depicted: Hay's structured search + [Commons native search]
=== Commons search API
SPARQL queries
[edit]Manuals for the public
[edit]In March 2020 we created this manual in Dutch for crowdsourcing this task among KB employees and the general public. This manual explains step by step how to make images from the KB collection more discoverable, visible and reusable by indicating (tagging) which things (entities) can be seen on those images. This is done by connecting Wikidata items to those things
XXXXXXXXXXX * Taggathons: small scale events where participants can add Wikidata structured tags to KB images on Wikimedia Commons using eg. the SIA tool. Taggathon is derived from 'editathon', 'hackaton' etc. This manual in Dutch can be handed out prior to the event
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Example API calls and Python scripts for machine access
[edit]Retrieving P180-Depicted entities in structured file data using SPARQL and the Commons API
[edit]Developing tools
[edit]Add structured data to files on Commons from an Excel sheet
[edit]We wrote a Python script that can write Property-Qid pairs from an Excel sheet to the Structured Data of files on Commons.
For instance it can add putto (Q284865) to the depicts (P180) property of the File:Atlas Schoemaker-UTRECHT-DEEL1-3120-Utrecht, Utrecht.jpeg from the Excel file P180Inputfile.xlsx
Althought mainly intended to add P180 values in bulk, this script is also able to add Wikidata Qids to other properties (than P180) in the structured data.
For further info and configuration, see https://github.com/KBNLwikimedia/SDoC/tree/main/writeSDoCfromExcel and https://commons.wikimedia.org/wiki/Commons:WriteSDoCfromExcel
Import structured data to Wikimedia Commons using pywikibot
[edit]- https://github.com/KBNLwikimedia/dict2sdc
- On PAWS: https://hub.paws.wmcloud.org/user/OlafJanssen/tree/SDoC_ScriptVeradeKok_voorPAWS
OpenRefine
[edit]using OR to add Sdco
[edit]- NL-Japan image donmatoopn - Broadsides prints
Contributing to OR software interface developopent
[edit]Workshops
[edit]Add worklshop on OpenRefine from WiiConNL - uploading images and ading SdoC via OR 3.7+
Work to do
[edit]Add lacking statements to KB images
[edit]Most relevant/important ones are:
- Add instance of (P31) -- Query --> alle abeeldingen moeten instance of (P31) = digital image (Q1250322) krijgen. Alle PDFs moeten instance of (P31) =electronic document (Q694975) met quaifier "file format=pfd" krijgen???. MP3 bestanden Beatrijs instance of (P31) =?? !
- Add depicts (P180) -- See this query for images lacking this. (example file)
- Add collection (P195) = KB National Library of the Netherlands (Q1526131) -- See this query for images lacking this. (example file)
- Add copyright status (P6216) -- See this query for images lacking this. (example file). Most often the value will be public domain (Q19652). See also this file (CC0 license)
- Add source of file (P7482) -- See this query for images lacking sourcing. (example file)
- Add digital representation of (P6243) -- See this query for images lacking this. (example file)
Other missing statements inlude
- author (P50) -- Query
- language of work or name (P407) -- Query
- copyright holder (P3931) -- Query
- inception (P571) -- Query
- publication date (P577) -- Query
- full work available at URL (P953) -- Query
Correct existing false statements
[edit]- Correct erroneous copyright claims to PD images. See this query and example file (falsely copyrighted) or this file (falsely CC-BY-SA)