Commons:Batch uploading/Codex Aureus
Digitized version of Stockholm Codex Aureus from around 750. Files released by the National library of Sweden as CC0.
- Source to upload from: https://data.kb.se/datasets/2016/04/codex_aureus/
- Do the media URLs follow a pattern? Yes
- Does the site have an API? Scrapable file server
- Did you contact the site owner? Yes.
- Describe the works to be uploaded in detail (audio files, images by …):
Digitized images of the Codex Aureus book in high resolution. Filenames include "v" (verso) and "r" (recto) for pagination indication. Filenames have sequence numbers.
- Which license tag(s) should be applied?
CC0.
- Is there a template that could be used on the file description pages? Do you think a special template should be created?
The National Library of Sweden has an org template used in other uploads. {{Kungliga biblioteket image}}
PeterKz (talk) 11:55, 4 June 2016 (UTC)
Opinions
[edit]Could you recommend the file naming scheme and how the image page text should look? As this is a single codex, there is not much to change from one image to another, apart from page numbers. As data.kb.se is on the GWT whitelist, this shoud be a reasonably quick upload. --Fæ (talk) 12:07, 4 June 2016 (UTC)
- I don't know what a proper scheme should be. Is there a standard for paginated media? Some other items seem to just use a sequence number appended to the title of the work.
- Here is a different work that has a template that maybe could be reused? https://commons.wikimedia.org/wiki/Template:R%C3%A5lambska_dr%C3%A4ktboken-KB
- Description proposal: Stockholm Codex Aureus, digitized by the National Library of Sweden. Signum A 135, Libris ID: 17848380
- Year of publication: ca 750
- --PeterKz (talk) 14:42, 4 June 2016 (UTC)
- I'll put together an XML file suitable for GWT. The details could always be mass changed later if there's a better scheme. --Fæ (talk) 16:03, 4 June 2016 (UTC)
- Thank you! Is there a standardized way to create upload specifications that works across all batch upload tools? If there is we could write a small script to generate it from other datasets at data.kb.se. --PeterKz (talk) 20:36, 4 June 2016 (UTC)
- There is a glitch with GWT right now, meaning that global categories are failing to be linked, making the tool fall over. I'm working around it, but it's not ideal as it means fiddling about post-upload.
- The category should be populated soon.
- A generalized way of uploading any documents could be worked out (i.e. pumping out XML files, then going via GWT and using a specific ingestion template), but it would be targeted at NLS' website. If this were a large enough project, I may be interested, though WMSE might want to take it first. --Fæ (talk) 21:50, 4 June 2016 (UTC)
- Thank you! If the format for the file list + basic metadata is stable it could probably be added easily to data.kb.se. That would make uploads to commons faster. Where can I find more details about the XML file list import format? --PeterKz (talk) 06:54, 5 June 2016 (UTC)
- Noticed that some of the preview files seem to be broken. Is there a bug in the image processing in Commons when generating them? --PeterKz (talk) 06:56, 5 June 2016 (UTC)
- Well, the process I used this time was not automatic. I cut & paste the file list into a text editor and trimmed it down to just links, then adapted that to become the XML file (see below and the examples in mw:Help:Extension:GWToolset); the XML generation could be semi-automated if large numbers of books were to be imported, and if the image link formats are done consistently so that page numbers or unique IDs can be extracted.
- The TIFFs may not preview well when looking at the category for the first time, refresh the view, they are available but the browser may time-out. --Fæ (talk) 07:44, 5 June 2016 (UTC)
Example XML record
[edit]This is a trimmed example xml file for p382, the format of <record> is duplicated to cover all 383 files in one large xml file.
<?xml version="1.0" encoding="UTF-8"?> <metadata xmlns:dc="https://purl.org/dc/elements/1.1/"> <records> <record> <source>https://data.kb.se/datasets/2016/04/codex_aureus/003827939%2C6900001%2Cw%2C382%2C187v.tif</source> <page>382</page> <title>Codex Aureus (A 135) p382</title> <filename>Codex Aureus (A 135) p382</filename> <unknown>Unknown</unknown> <description>Stockholm Codex Aureus, digitized by the National Library of Sweden. Signum A 135, Libris ID: 17848380</description> <date>{{circa|750}}</date> </record> </records> </metadata>
The mapping is done visually to {{information}} in the COM:GWT, but can be imported from GWToolset:Metadata_Mappings/Fæ/Codex_Aureus.json.
Assigned to | Progress | Bot name | Category |
---|---|---|---|
Fæ | Delays caused due to a GWT linking bug needing categorization work-arounds. .
Upload complete, global categories added using VFC. Kungliga biblioteket image template added post-upload. Zero padding page numbers after upload using custom file move. Done |
GWT | Codex Aureus |