Commons:FAIRCommons
Pending tasks for FAIRCommons: | edit this list - add to watchlist - purge | |
---|---|---|
|
How FAIR is Wikimedia Commons?
[edit]This question was brought up in the Wikimedia Commons Community Telegram Group and it is, indeed, a relevant one. The FAIR Principles are a widely accepted quality standard for data - originally developed for research data, it has been adopted for cultural heritage data as well.
This project aims at starting and documenting a discussion on the FAIRenss of Wikimedia Commons: how FAIR is (data on) Commons and what must be done to make it even FAIRer?
How is data modelled and presented in Commons?
[edit]Each object (image, video, audio etc.) has its own page containing
1. information on the object (metadata)
2. a preview as well as links to (often multiple versions of) the object
Metadata is given as unstructured or semistructured (wikitext, links to Wikipedia or Wikidata) text and as structured data.
FAIR assessment
[edit]Findable
[edit]The object pages containing metadata in human-readable form have a persistent URL.
The objects themselves are assigned an M-ID (corresponding to the MediaWiki page ID) that can be found at the Concept URI link on the left hand side and is also contained in the html code (e.g. <head> ... <link rel="alternate" href="https://commons.wikimedia.org/wiki/Special:EntityData/M45284.json" type="application/json"> ... </head>
).
It appears (based also e.g. on [2]) that:
https://commons.wikimedia.org/entity/M-ID
can be understood as URI designating the entity.https://commons.wikimedia.org/wiki/Special:EntityData/M-ID
stands for the metadata describing the entity. Metadata is accessible via content negotiation.
The file object itself has an upload URL that's not directly exposed in a machine-readable form (it's contained, e.g. in the jsonld-representation of the object, but it can also be retrieved via API)
A machine-actionable path from the M-ID to the file could be:
curl https://commons.wikimedia.org/wiki/Special:EntityData/<M-ID>.jsonld | jq -r '.["@graph"][] | select(.contentUrl) | .contentUrl'
F2. Data are described with rich metadata (defined by R1 below)
[edit]F3. Metadata clearly and explicitly include the identifier of the data they describe
[edit]The metada contains the file name and the object IDs, so yes, this criterion is met.
F4. (Meta)data are registered or indexed in a searchable resource
[edit]Metadata and data are indexed in the Commons search, Google & Co. and findable via search tools focusing on openly licensed resources.
Accessible
[edit]A1. (Meta)data are retrievable by their identifier using a standardised communications protocol
[edit]Cf. F1.
A1.1 The protocol is open, free, and universally implementable
[edit]Data is available via HTTP.
The objects can be accessed via API, e.g. https://api.wikimedia.org/core/v1/commons/file/File:%C3%89douard_Manet_-_Le_D%C3%A9jeuner_sur_l'herbe.jpg, metadata is retrievable via the Mediawiki API (e.g. [3])
A1.2 The protocol allows for an authentication and authorisation procedure, where necessary
[edit]HTTP would allow for that, but accessing data (programmatically) only requires a non-empty User agent, no authentication etc.
A2. Metadata are accessible, even when the data are no longer available
[edit]How does Commons deal with deleted data?
Interoperable
[edit]I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
[edit]Metadata is predominantly strings and wikitext wrapped in a JSON file. This is not particularly interoperable.
Provided there is an M-ID, a machine-readable representation of structured metadata is available at https://commons.wikimedia.org/wiki/Special:EntityData/<M-ID>.json
I2. (Meta)data use vocabularies that follow FAIR principles
[edit]Unstructured metadata on Commons, even if occasionally linked to Wikpedia pages, does probably not meet this criterion. Structured data on Commons, however, arguably does, although it is not entirely clear to what extent Wikidata is FAIR. Wikidata allows for an indirect connection with domain-specific vocabulary.
I3. (Meta)data include qualified references to other (meta)data
[edit]This is probably true for structured data on Commons.
Reusable
[edit]R1. (Meta)data are richly described with a plurality of accurate and relevant attributes
[edit]R1.1. (Meta)data are released with a clear and accessible data usage license
[edit]The license of the data object is specified in a non-machine readable way in the metadata (e.g. https://commons.wikimedia.org/w/api.php?action=query&titles=File:%C3%89douard_Manet_-_Le_D%C3%A9jeuner_sur_l%27herbe.jpg&prop=imageinfo&iiprop=extmetadata]).
Machine-readable license is contained in the jsonld: https://commons.wikimedia.org/wiki/Special:EntityData/<M-ID>.jsonld
R1.2. (Meta)data are associated with detailed provenance
[edit]R1.3. (Meta)data meet domain-relevant community standards
[edit]Wikimedia Commons addressing a highly diverse and general public, this criterion is probably not applicable.
See also
[edit]- Janssen, O. (2024). 10 reasons why I moved my publications from SlideShare to Zenodo, and keep them on Wikimedia Commons. Zenodo, https://doi.org/10.5281/zenodo.13365077. Also available on Wikimedia Commons