Komunejo:Maŝino-legebla dateno
On Wikimedia Commons, a lot of metadata (including license and author) are not machine readable. There is an API module, iiprop=extmetadata which can be used to retrieve some values (example), but as the information is entered as free text into the file description page itself, this is not perfect. The ongoing Structured data on Commons project aims to move the metadata as fully-structured data and will eventually supersede the machine-readable data presented in this page.
In the meantime, and to ease a transition towards more structured data at a future time, Wikimedia Commons use a set of standard templates which have been made machine-readable in some ways, through HTML elements. Some scripts already make use of that. It is worth noting that this data is available for any wiki using Wikimedia Commons, where it can be read from the html of the File: page just as other local data.
Maŝino-legebla dateno
Machine readable data set by infobox templates
These are several standard infobox templates tagging different elements of the template with different tags to allow parsing of the information. Several different styles of tags are used:
- Microformat tags follow industry standards and can be parsed by already existing tools.
- <td> id attributes (identifiers) are custom markings which allow more complete tags, which have to be read by custom tools. Most universal infoboxes have two column structure: column #1 holds name of the field and column #2 holds the value
- Traditionally <td> id attributes were used to tag the name call in the first column in a row. To get the data, you would need to get the contents of the following
<td>
cell in the second column. - {{Creator}} and {{Institution}} templates have more complicated structure, so the cells with the actual data are tagged with
attributes using magenta background
.
- Traditionally <td> id attributes were used to tag the name call in the first column in a row. To get the data, you would need to get the contents of the following
Ŝablono | Nomo de ŝablona parametro | Priskribo | <td> identigilo-atributo | Microformat | Komento |
---|---|---|---|---|---|
{{Information}} | description | priskribo de dosiero | fileinfotpl_desc |
hProduct.description. | Often contains multiple languages annotated with {{Lang}}. |
{{Information}} | date | date the original work was created | fileinfotpl_date |
hCalendar vevent.dtstart | Sometimes additionally, or only, contains publication date. These two dates have different meanings for copyright. When used, {{Date context}} can indicate the difference. Microformat added by {{Date}} template |
{{Information}} | source | fonto de dosiero | fileinfotpl_src |
Often contains entire tables. We have no good way to deal with this source templates yet. Source templates often have references to catalogue IDs, but these are also not machine readable. | |
{{Information}} | author | aŭtoro de dosiero | fileinfotpl_aut |
This can be author, creator and/or copyright holder and is used mixed. Often contains the {{Creator}} template which is described below. | |
{{Information}} | permission | permesilo pri la dosiero | fileinfotpl_perm |
||
{{Information}} | other versions | aliaj versioj de la dosiero | fileinfotpl_ver |
||
{{Artwork}} | description | priskribo de la artobjekto | fileinfotpl_desc |
hProduct.description | |
{{Artwork}} | date | dato kiam la artobjekto kreiĝis | fileinfotpl_date |
hCalendar vevent.dtstart | microformat added by {{Date}} template |
{{Artwork}} | source | fonto de dosiero | fileinfotpl_src |
||
{{Artwork}} | artist | kreinto de la artobjekto | fileinfotpl_aut |
"hProduct.fn value" | |
{{Artwork}} | author | aŭtoro de la artobjekto | fileinfotpl_aut |
"hProduct.fn value" | |
{{Artwork}} | permission | permesilo pri la dosiero kaj la artobjekto | fileinfotpl_perm |
||
{{Artwork}} | other versions | aliaj versioj de la dosiero | fileinfotpl_ver |
||
{{Artwork}} | title | titolo de la artobjekto | fileinfotpl_art_title |
hProduct.fn | |
{{Artwork}} | object type | speco de la artobjekto | fileinfotpl_art_object_type |
||
{{Artwork}} | medium | technique or medium of the artwork | fileinfotpl_art_medium |
||
{{Artwork}} | dimensions | grando de la artobjekto | fileinfotpl_art_dimensions |
||
{{Artwork}} | gallery | institution holding the artwork | fileinfotpl_art_gallery |
||
{{Artwork}} | location | location of the artwork within the institution | fileinfotpl_art_location |
hProduct.locality | |
{{Artwork}} | accession number | accession number of the artwork | fileinfotpl_art_id |
hProduct.identifier | |
{{Artwork}} | object history | historio de la artobjekto | fileinfotpl_art_object_history |
||
{{Artwork}} | exhibition history | ekspozicia historio de la artobjekto | fileinfotpl_art_exhibition_history |
||
{{Artwork}} | credit line | credit line of the artwork | fileinfotpl_art_credit_line |
||
{{Artwork}} | inscriptions | inscriptions on the artwork | fileinfotpl_art_inscriptions |
||
{{Artwork}} | notes | noto pri la artobjekto | fileinfotpl_art_notes |
||
{{Artwork}} | references | referencoj pri la artobjekto | fileinfotpl_art_references |
||
{{Book}} | Author | aŭtoro de la libro | fileinfotpl_author |
||
{{Book}} | Editor | redaktoro de la libro | fileinfotpl_book_editor |
||
{{Book}} | Translator | tradukinto de la libro | fileinfotpl_book_translator |
||
{{Book}} | Illustrator | ilustrinto de la libro | fileinfotpl_book_illustrator |
||
{{Book}} | Title | titolo de la libro | fileinfotpl_book_title |
||
{{Book}} | Subtitle | subtitolo de la libro | fileinfotpl_book_subtitle |
||
{{Book}} | Series title | titolo de la libroserio al kiu apartenas la libro | fileinfotpl_book_series-title |
||
{{Book}} | Authority file | dateno por aŭtoritata kontrolo | fileinfotpl_book_authority |
||
{{Book}} | Publisher | eldonejo de la libro | fileinfotpl_book_publisher |
||
{{Book}} | Printer | presejo de la libro | fileinfotpl_book_printer |
||
{{Book}} | Year of publication | dato aŭ jaro kiam la libro estis eldonita | fileinfotpl_date |
||
{{Book}} | Place of publication | loko en kiu la libro estis eldonita | fileinfotpl_book_place-of-publication |
||
{{Book}} | Language | lingvo de la libro | fileinfotpl_book_language |
||
{{Book}} | Description | priskribo de la libro | fileinfotpl_desc |
||
{{Creator}} | Name | Nomo de kreinto | creator |
vCard.fn | |
{{Creator}} | Alternative names | Alia(j) nomo(j) de kreinto | fileinfotpl_creator_alt-name_value |
vCard.nickname | |
{{Creator}} | Description | Nacieco(j) kaj okupo(j) de la kreinto | fileinfotpl_creator_desc_value |
vCard.note | |
{{Creator}} | Date of death | Mortodato de la kreinto | fileinfotpl_creator_deathdate_value |
||
{{Creator}} | Date of birth | Dato de naskiĝo de kreinto | fileinfotpl_creator_birthdate_value |
vCard.bday | |
{{Creator}} | Location of birth/death | Loko de morto de kreinto | fileinfotpl_creator_deathloc_value |
||
{{Creator}} | Location of birth | Loko de naskiĝo de kreinto | fileinfotpl_creator_birthloc_value |
||
{{Creator}} | Work period | Work period of creator | fileinfotpl_creator_work-period_value |
||
{{Creator}} | Work location | Work location of creator | fileinfotpl_creator_work-location_valuev |
||
{{Creator}} | Image | portreto aŭ fotografaĵo prezentanta la kreinton | fileinfotpl_creator_image |
||
{{Creator}} | Authority file | Authority control related to the creator | fileinfotpl_creator_authority_value |
| |
{{FileContentsByBot}} | (various) | depends, please confer {{FileContentsByBot}} | (various) |
hproduct-by-bot | big data set and still growing, please confer {{FileContentsByBot}} |
{{Photograph}} | title | titolo de la fotografaĵo | fileinfotpl_art_title |
hProduct.fn | |
{{Photograph}} | description | priskribo de la fotografaĵo | fileinfotpl_desc |
hProduct.description | |
{{Photograph}} | original description | originala arkiva priskribo de la fotografaĵo | fileinfotpl_desc |
hProduct.description | |
{{Photograph}} | date | date of creation of the original artwork | fileinfotpl_date |
hCalendar vevent.dtstart | microformat added by {{Date}} template |
{{Photograph}} | medium | technique or medium of the photograph | fileinfotpl_art_medium |
||
{{Photograph}} | dimensions | alto kaj larĝo de la fotografaĵo | fileinfotpl_art_dimensions |
||
{{Photograph}} | artist | kreinto de la fotografaĵo | fileinfotpl_aut |
"hProduct.fn value" | |
{{Photograph}} | institution | institution holding the artwork | fileinfotpl_art_gallery |
||
{{Photograph}} | location | location of the photograph within the institution | fileinfotpl_art_location |
hProduct.locality | |
{{Photograph}} | source | fonto de dosiero | fileinfotpl_src |
||
{{Photograph}} | permission | permission/license for the file and artwork | fileinfotpl_perm |
||
{{Photograph}} | other versions | alia versioj de la dosiero | fileinfotpl_ver |
||
{{Photograph}} | accession number | accession number of the photograph | hProduct.identifier |
Alternative format for CommonsMetadata
Because the table + id based format proved very hard to add to templates which were not formatted similarly to the Commons information template, CommonsMetadata allows an alternative format, similar to license templates: the whole information template has to be enclosed in a fileinfotpl
class and the tag containing the specific information needs to have a fileinfotpl_*
class (same names as above, but class, not id).
Maŝine legebla dateno difinita de permesilo-ŝablonoj
Introduced in October 2010, using classes <span class="licensetpl_XXX">
licensetpl
- An element identifying a license. Wraps the entire license code and should be a SINGLE license, not a multi license.
licensetpl_short
- Short name of the license: “Public domain”, “CC BY-SA 3.0”, “CC by 2.0 fr”, etc.
licensetpl_long
- Long name of the license: “Public domain”, “Creative Commons Attribution-Share Alike 3.0”,
licensetpl_attr_req
- Whether attribution is required. “true” or “false”.
licensetpl_attr
- The requested attribution: Free text.
licensetpl_link_req
- Whether a link to the license is required for this license. “true” or “false”.
licensetpl_link
- The link to the license deed. “www.creativecommons.org/licenses/by-sa/XXX/YYY”
licensetpl_nonfree
- “true“ if this is a non-free license (not used on Commons, only on wikis with an EDP)
Multiple licensetpl
blocks for the same work might be wrapped in a block using the class licensetpl_wrapper
.
Ŝablonoj difinantaj tiajn informojn
- Templates setting
licensetpl
include:
{{PD-Layout}}, {{Cc-by-sa-3.0-migrated}}, {{Cc-by-layout}}, {{Cc-by-sa-layout}}, {{Cc-zero}}, {{FAL}}, {{GFDL}}, {{GFDL-1.2}}, {{GPL}} kaj {{LGPL}}.
Machine readable data set by style formatting templates
Style formatting templates, meant to provide uniform styles to different families of non-license templates, carry machine readable data identifying these families.
Ŝablono | Intenco | class name |
---|---|---|
{{Restriction-Layout}} | used by Restriction tags | restrictiontemplate
|
{{FoP-Layout}} | used by freedom of panorama tags | foptemplate
|
{{Partnership-Layout}} | used by Partnership templates | partnershiptemplate
|
{{Source-Layout}} | used by generic Source templates | sourcetemplate
|
{{Created with}} | used by Created with ... templates | createdwithtemplate
|
Machine readable data set by non-copyright restriction templates
Templates regarding non-copyright legal restrictions carry these classes to identify specific types of restrictions.
Ŝablono(j) | Intenco | class name |
---|---|---|
{{Trademarked}} | Trademarked images | restriction-trademarked
|
{{Copydesign}} | Copyrighted designs | restriction-design
|
{{Communist symbol}} | Komunistaj simboloj | restriction-communist
|
{{Italy-MiBAC-disclaimer}} {{Soprintendenza}} | Italia kultura varo | restriction-ita-mibac
|
{{Australian Commonwealth reserve}} | Australian reserves | restriction-aus-reserve
|
{{Personality rights}} {{Romania personality rights}} | Personality rights | restriction-personality
|
{{2257}} | Child Protection and Obscenity Enforcement Act warning (United States) | restriction-2257
|
{{Costume}} | Kostumaĵo | restriction-costume
|
{{Fan art}} | Fervorula arto | restriction-fan-art
|
{{Currency}} | Valuto | restriction-currency
|
{{IHL Symbol}} | Symbols restricted by International Humanitarian Law | restriction-ihl
|
{{Nazi symbol}} | Naziaj aŭ faŝisma simbolo | restriction-nazi
|
{{Insignia}} | Official insignia | restriction-insignia
|
Maŝine legebla dateno difinitaj de specifaj ŝablonoj
More machine-readable data are set. Here is a non-exhaustive list:
- {{Personality rights}}
<span class="commons-template-name" style="display:none" id="commons-template-personality-rights">Personality rights</span>
- {{Credit line}}
<td id="fileinfotpl_credit" class="fileinfo-paramfield fileinfotpl_credit" style=""></td>
Machine-readable data set by location templates
{{Location}} and similar templates add machine-readable geocodes in the following format: <span class="geo">12.34;24.68</span>
(latitude and longitude as floating-point numbers, separated by a semicolon). The coordinates use the en:WGS84 system (same as the GPS and most online maps). See Commons:Geocoding for more details.
Uzado
Aplikprograma Interfaco de MediaVikio
(Open in API Sandbox) that returns some useful parameters such as Credit, Artist, LicenseUrl and Copyrighted and is used by Media Viewer, for example.
Scripts using machine-readable data
- MediaWiki:Gadget-Stockphoto.js
- MediaWiki:GallerySlideshow.js
- MediaWiki:Gadget-AddInformation.js
- MediaWiki:FileContentsByBot.js
Eksteraj iloj
Vidu ankaŭ
- Category:Templates generating microformats
- Commons:WikiProject Microformats
- Category:Files with lack of machine-readability
- Jen eksperimentaj, eksaj projektoj: Commons:API, Commons:Commons API
Difini novan maŝin-legeblan datenon
- Do NOT use HTML id's, use classes. An ID can only be used once per page and most of these fields can occur multiple times per page. Consider for instance descriptions of derivative works, which can include information about the original and the derivative.
- When possible, wrap the actual data, not some field header. This last method is historically used for all our Information templates, but much harder to support in the long run.
- Wrap data, not the way the data is formatted.
- Expect that formatting is lost when converting to data. Visual dress up is not part of the information.
- Don't wrap multiple units of information inside one field. There is a difference between a publication date and a creation date. Both are dates, but both are different 'data fields'. Also CC BY-SA-4.0-3.0-2.5 is not a license name, those would be 3 licenses with the name CC BY-SA-##.
- Make sure that the data value has one unit, or outputs one consistent unit.
Problemoj
Jen aferoj ne jam maŝine legeblaj:
- Derivative works
- Works included in works. See also Category:FoP_templates
- licenses derivates or works included in works are a mess.
- Author vs. Copyright holder
- usernames vs 'real names'
- Catalogue IDs etc
- VRTS permissions
- Publication date vs creation date
- Donating institutions of materials
- Anything that is NOT using the above structures is not recognizable at all and will require manual cleanup at some point.
- Heirs: {{Heirs-license}}
- Multilicensed CC works, that use {{Cc-by-3.0,2.5,2.0,1.0}}, {{Cc-by-sa-2.5,2.0,1.0}}, {{Cc-by-sa-4.0,3.0,2.5,2.0,1.0}} or {{Cc-by-all}}.
- Non-licensed works: {{Copyrighted free use}}, {{Attribution}} (Problem, how to describe this grant of rights success ?)
- Improvised File description templates like User:Tevaprapas/Information
- Templates denoting the copyright of partials of the work: {{Copyright information}}