Commons talk:Structured data/Get involved/Feedback requests/Ontology
Contents
- 1 Discussion
- 2 Discussion for metadata that is specific or important for GLAMs
- 3 Semi-structured and unstructured data
- 4 EXIF metadata?
- 5 Tag namespace? or better "Category all images" namespace
- 6 Wikibase database physically close to mediawiki database and other ideas
- 7 Tools
- 8 Notability of Commons creators
- 9 Presentation templates, and their information
- 10 File resolution -- accessible from queries ?
- 11 Licensing
- 12 Bulk uploads with very limited metadata
- The following discussion is archived. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Thank you all for the comments and questions. There seems to be general understanding and agreement to the early framework of what should go where. Many questions have been raised and still remain about implementation, and those details will be sorted out piece-by-piece as development begins. I look forward to seeing you all at our next discussion. Keegan (WMF) (talk) 21:49, 1 March 2018 (UTC)[reply]
The team wants to hear what you think about this high-level framework. There are many parts to it that will need additional discussion to actually implement, and those conversations will happen as development gets into those details. For now, the point is the general direction that is being considered. This will be open formally for two weeks, closing on 1 March.
What do you think? Keegan (WMF) (talk) 21:51, 15 February 2018 (UTC)[reply]
- So... it might be because it's very late in my timezone, but I'm not really sure I understand the questions. Therefore, I'll just make some generic comments, mainly from a bot user PoV.
- Categories: I was kind of hoping we could get rid of them and replace them with something multilingual. The rule about English names for the categories has always been a thorn in the accessibility of commons for non-English speakers.
- MW vs. WB@Commons vs. Wikidata:
- while technically it makes a lot of sense to have pointers from one database to the other(s), it makes it hard to process an entry through the API, as one has to make 3 different requests to get all the data, with each response having another "language" (e.g. different data types etc.) I would like to be able to retrieve the whole page "contents" (whatever that means and no matter where it is all stored) with a single API call to commons.
- Also, the data types, property names etc. of WB@Commons should have an equivalent at Wikidata (e.g. the "author" property should be the same at Wikidata and Commons, in order to make it easy for bots to move from Wikidata to Commons). Technically I don't think this is very difficult, but it might turn out hard to coordinate the 2 communities.--Strainu (talk) 23:23, 15 February 2018 (UTC)[reply]
- @Strainu: On the categories point: we won't be getting rid of categories any time soon, but we will make use of Wikidata-style "depicts" statements, which are multilingual and we hope will supplant categories to a large degree. RIsler (WMF) (talk) 21:49, 16 February 2018 (UTC)[reply]
- Thanks for the response, RIsler (WMF). Would it be feasible (even if not in this project) to somehow include Wikibase in the categories as well? That would allow them to have multilingual names.--Strainu (talk) 22:03, 16 February 2018 (UTC)[reply]
- Hi Strainu. Our team and the Wikidata team have considered this for months and concluded this idea isn't going to work as part of this project. Perhaps future endeavors can address using categories in that way, but our work with depicts statements will hopefully make that unnecessary by providing a better alternative. RIsler (WMF) (talk) 22:56, 16 February 2018 (UTC)[reply]
- Thanks for the response, RIsler (WMF). Would it be feasible (even if not in this project) to somehow include Wikibase in the categories as well? That would allow them to have multilingual names.--Strainu (talk) 22:03, 16 February 2018 (UTC)[reply]
- Comment I think creating items for non-notable content on Wikidata that are just like regular items is a mistake. Wikidata has already got enough on its plate to try to organise what's already within the WD:Notability guideline. Dumping a vast undifferentiated number of more items there is going to be a nightmare.
- Aside from that, I don't think the line "Properties (length, height, weight, material, etc)" has been thought through, as an example of what should be on Wikidata. If I've got a scanned, cropped image of a book illustration (of which there must be well over a million on Commons), I don't think it makes sense to make a Wikidata item for each illustration, still less for each physical copy of the illustration. Yet each may have different dimensions. An item for the book as a whole on Wikidata, maybe. (Though I'm dubious about an item for each copy). But having to make an distinct item for every single illustration? I don't think that makes sense; and I don't think uploaders would do it. So there needs to be an option to supply some of the information locally, in the CommonsData wikibase -- including length and height, date and page number, etc. Jheald (talk) 23:25, 15 February 2018 (UTC)[reply]
- Hi, Jheald. To clarify, when we mentioned Properties in Wikidata, we don't mean creating a new Wikidata item and adding the properties there. We meant *referring* to Wikidata properties from the file's entry on Wikibase@Commons. For example, we want to avoid Wikibase@Commons having to define a Width property because Wikidata already has a perfectly good one (P2049). With Wikibase Federation (a feature already completed) Wikibase@Commons and Wikidata can work together in a way that avoids duplication, so when you're filling out the statements for your cropped image scan, even though you're using the UI on Commons, behind the scenes you're referring to Wikidata stuff too. We're in the very early phases of illustrating how this would work, but that's the gist of it. RIsler (WMF) (talk) 01:14, 17 February 2018 (UTC)[reply]
- Just to add, I wrote that I don't think uploaders will create a new item for each illustration -- but actually, wider than that, I wonder whether manual uploaders will create new Wikidata items for anything. The experience with referencing on Wikidata suggests to me that they simply won't bother -- just as people manually editing Wikidata mostly don't add references, because they're such a chore to create; even worse, if you have to create new items to refer to. As a result, referenced statements in Wikidata are mostly limited to content created in systematic bot sweeps. I suspect the same may well turn out to be the case for Commons. Jheald (talk) 23:46, 15 February 2018 (UTC)[reply]
- Comment, I would really want to support all of this data to also be stored at Wikidara, but wouldn't this flood Wikidata? Would the volunteers have to organise the vast amount of instances created by Wikimedia Commons or could bots handle it? Currently Wikidata is very "Commons-unfriendly" and creating an entry for a Wikimedia Commons category requires one to manually add "commonswiki" to "Other sites". --Donald Trung (Talk 💬) (WikiProject Numismatics 💴) (Articles 📚) 01:55, 16 February 2018 (UTC)[reply]
- Comment I have a hard time following the proposal. For example term "MediaWiki" is very confusing We do have 20k MediaWiki namespace pages, like MediaWiki:Gadget-UploadWizard, but I think you just mean wikitext in file namespace that we use right now. One difference between Wikibase@Commons and Wikidata would be that Wikidate would store information about notable books, artworks, etc. but Wikibase@Commons would store information about specific instance (digitization, file) of that a artwork or book. Data on Wikidata need to be sourced and the subjects need to be notable, for example d:Wikidata:WikiProject sum of all paintings gathers wikidata for all notable paintings. Similar projects might be creating items for other notable artworks or majority of the published books. We should differentiate content on Wikibase@Commons associated with specific file with potential (?) content which would be not directly associated with a single file but not meet notability and referencing requirements of Wikidata. Lets look at a few examples:
- an scan or photo of an notable artwork, like d:Q3898508 might have:
- Wikidata: All metadata about the artwork including it's copyright status per jurisdiction (that info is not stored on wikidata at the moment, maybe we should began discussion if it should)
- Wikibase@Commons:
- items associated with the file: metadate related to specific digitization, like who, when and how digitized it, copyright status related to digitization per jurisdiction
- other items: metadata related to non-notable person who photographed or scanned the artwork
- A page from a book, like File:PL Adam Mickiewicz-Pan Tadeusz 065.jpg
- Wikidata: all the info about the book, including the copyright status
- Wikibase@Commons:
- items associated with the file: book ID, page number, digitization license
- other items: information related to digitization of this book: who and how did it, etc.
- Photograph by non-notable photographer, like File:Canis latrans (Yosemite, 2009).jpg
- Wikidata: nothing other than already existing items related to depicted people or animal species, places, etc
- Wikibase@Commons:
- items associated with the file: most of the information stored in the current description, including license and non-copyright restrictions, otrs, etc.
- other items: info related to photographer
- non-notable image created by anonymous or not notable creator, like File:1895 - Ioan Istrate cu gradul de capitan.jpg
- Wikidata: nothing other than already existing items related to depicted people or animal species, places, etc
- Wikibase@Commons:
- items associated with the file: most of the information stored in the current description, including license and non-copyright restrictions, otrs, etc.
- other items: info related to photographer
- --Jarekt (talk) 05:16, 16 February 2018 (UTC)[reply]
- I've tweaked the text a little to clarify that MediaWiki is referring to wikitext, yes. Keegan (talk) 05:57, 16 February 2018 (UTC)[reply]
- Comment Hard to imagine without a live example. e.g. I dont underatand what wikitext templates has with EXIF information as it usually include different type of metadata. // "Things" depicted will be kind of tags? or where this information will be retrieved. I was wondering, youll get it from categories. As categories on WC are set that way to also describe depiceted. So categories no Commons are something what could be easily transformed to structured data, so I dont understand why it should be a "string" in Wikibase.--Juandev (talk) 05:22, 16 February 2018 (UTC)[reply]
- Comment Storing EXIF data in the Commons Wikibase seems like an obvious thing to do. At present they are stored in some sort of text dump in a single database field, and the fields can't be queried individually. A file may typically have over 40 fields in Exif though, without even considering other formats like maker notes and XMP, I don't know if you'd want them all, or just a selection. I also don't know whether you'd want these fields to always correspond to what's in the file, or if they should be editable. When a new version of a file is uploaded, the values should probably be cleared and reloaded from the new file if present. Some of the values will be text, but I'm not sure that it's really necessary to be able to translate them (it was mentioned above that only multilingual text would be stored in Wikibase). --ghouston (talk) 06:28, 16 February 2018 (UTC)[reply]
- Comment Whatever is done must allow for easy curation by the project volunteers where the items will be displayed. We are currently having a huge hassle on English Wikipedia because WMF decided to use Wikidata descriptions to disambiguate search results for Wikipedia articles. These results are not visible from Wikipedia desktop view, and changes are not intelligible on watchlists, so are not seen by the curatorial editors, and to be fixed the editor must go to Wikidata where everything looks unfamiliar. This may have looked like a good idea at the time, but it has lead to major problems due to policy incompatibility between projects, and difficulty to curate material stored and vandalised on one project which is then displayed on another, where the editing population do not know how to correct it, and do not wish to be forced to learn. (They are volunterrs and will do what they want to do. Attempts at coercion will just drive volunteers away, and none of the projects has too many volunteers) The reaction to this on English Wikipedia is moving toward not allowing the automated use of anything from other prjects unless their standards of curation and verification are acceptable and compatible with English Wikipedia, and that BLP conditions are met. At the moment Commons is OK, Wikidata is not. Beware of unintended consequences, some can be very damaging and expensive to fix. · · · Peter (Southwood) (talk): 06:37, 16 February 2018 (UTC)[reply]
- @Pbsouthwood: The plan is to have editing and access to revision histories to things being displayed on Commons from Wikidata. The need for this is very strong and well understood, we hope to make it happen. Keegan (WMF) (talk) 18:10, 16 February 2018 (UTC)[reply]
- Comment It's going to be confusing when things can be done in two different ways, and it's not simply a matter of the old legacy way and the new improved way, but two systems which are apparently intended to coexist forever. These are a) treating things differently depending on whether they are notable or non-notable, according to Wikidata criteria, and b) storing the same information in categories and also as pointers to Wikidata. --ghouston (talk) 06:53, 16 February 2018 (UTC)[reply]
- While as a lover of databases, I approve of the general principle of storing information in Wikibase@Commons, I do have a couple of worries. First, editing Wikidata is a cumbersome process. I've just been going through a bunch of files adding (sourced) camera and object locations and credit lines. Because these are stored as wikitext, I can paste all of them into the editing page at once and save them all with a single click on "Publish changes". On Wikidata, I think that would need at least ten clicks and three separate save operations. Second, free text on a wiki page is a great way to capture information that you can't work out how to codify yet. Commons is already bureaucratic enough, and I don't think users should have to work out how to represent information in Wikibase@Commons before they can associate it with a file. --bjh21 (talk) 10:30, 16 February 2018 (UTC)[reply]
- Comment even with limited contributions I think that the person creating media files should be considered notable enough for a wikidata item or at least referred to as something more considerate than nonnotable. Another concern is that the "licensing info" in the camera exif data will likely be in conflict with the license provided when uploading. We also need to consider the impact on legacy licensing as items move from cc-by to pd, countries change laws and trade agreements. Gnangarra 11:09, 16 February 2018 (UTC)[reply]
- I agree with the fact that phrasing "nonnotable" is quite negative, I don't agree with creating items for a lot of people just because you don't like the phrasing. Let's see if we can come up with a more positive way of referring. Any suggestions? As for anything in exif: I expect it to be copied by bots or other tools to the local Wikibase so you can edit it without changing the exif. Nothing different than bots extracting coordinates from exif and putting {{Location}} on the file. You can update that too. Multichill (talk) 14:16, 16 February 2018 (UTC)[reply]
- I imagine having fields in Wikibase which are explicitly designated as copies of the Exif data, perhaps with an Exif: prefix, and that this could replace the weird way that Exif data is stored now. The Exif:License field would just be text which is additional to the official license fields. --ghouston (talk) 00:10, 17 February 2018 (UTC)[reply]
- Comment Likely the copyright status/license should be stored on commons. --Steinsplitter (talk) 12:16, 16 February 2018 (UTC)[reply]
- @Steinsplitter: you might be a bit confused by File:Structured Data on Commons - which information goes where - version 2017-10-31.png. The concept of the license like Creative Commons Attribution-ShareAlike 4.0 International (Q18199165) will be on Wikidata, but here we'll store what license a file uses by linking to that concept. Multichill (talk) 14:11, 16 February 2018 (UTC)[reply]
- @Multichill: Oh, Got it now. Thanks for clarifying :). --Steinsplitter (talk) 14:26, 16 February 2018 (UTC)[reply]
- @Steinsplitter: you might be a bit confused by File:Structured Data on Commons - which information goes where - version 2017-10-31.png. The concept of the license like Creative Commons Attribution-ShareAlike 4.0 International (Q18199165) will be on Wikidata, but here we'll store what license a file uses by linking to that concept. Multichill (talk) 14:11, 16 February 2018 (UTC)[reply]
- I'm a bit dubious about just using a URI to name non-notable authors because often Commons has pictures by the same author acquired by different routes. For instance, User:Oxyman also has an account on Geograph Britain and Ireland, and ideally we'd like to record on Commons that they are the same person. This also happens for people without Wikimedia accounts: there's one person whose pictures we've acquired both through Geograph and through Flickr. I think the right approach might be to have stub Wikibase@Commons items for these people. --bjh21 (talk) 12:33, 16 February 2018 (UTC)[reply]
- Yeah, Commons wants to refer to various things, like people who have authored files, and perhaps various topics, which aren't considered notable on Wikidata, and allowing items for them on Commons would be a solution. However, this has apparently been considered as part of this project and ruled out. I'm not sure why, perhaps it's technical reasons. --ghouston (talk) 00:10, 17 February 2018 (UTC)[reply]
- Something that seems to be missing from this picture is any representation of relationships between files. This commonly takes the form of one file being derived from another, but linking to the next and previous file in a series, or to photographs of the same scene taken at different times, are common. Additional complications: sometimes we mark a Commons file as a derivative of a file that can't itself be hosted on Commons (e.g. because it has non-free parts); sometimes a file can be a derivative of a version of a file on Commons that isn't the current version. --bjh21 (talk) 15:54, 16 February 2018 (UTC)[reply]
- Other forms of data would include annotations of the image -- for example, text that can be found in different parts of the image, and the coordinates of where in the image it occurs. This may often have been transcribed if the source of the image is a library. Or particular details of interest in the image, and what they represent, again perhaps with a bounding box. If what the image represents is notable enough to have its own Wikidata item, the data might be there with depicts (P180) and relative position within image (P2677); but if not one would probably want to be able to record such things in CommonsData. Templates or display code would need to be able to cope seamlessly with either. Jheald (talk) 17:41, 16 February 2018 (UTC)[reply]
- Should the red arrows be going to the left instead? Isn't the idea that those values are stored in Wikidata and pulled into the Commons view? czar 22:46, 17 February 2018 (UTC)[reply]
- @Czar: the arrows are illustrative of where data will migrate to once the process starts, hence going to the right. After data migration the File page will be pulling information from the various sources, reversing the arrows as you describe. Keegan (WMF) (talk) 21:27, 20 February 2018 (UTC)[reply]
- Comment, to comment on the specific points I see in image and text on the project page:
- geo location is usually the location where the photographer stood. Not the place where the photographed subject is/was. Don't mix them up! Don't get the location of where the photographer stood from Wikidata, that is just nonsense. The location of the photographed subject and the photographer are two different things and need to different properties in Commonsbase.
- dates: there are still too many dates Wikidata can't handle well enough. A simple date will work, a complex date like "between 1800 and 1813" or "end 18th century or beginning 19th century" are difficult to insert in Wikidata, and will be too difficult for new uploaders. If the same system as Wikidata is used for uploading, this will result in incomplete or false information.
- notable contributors have a creators template with information. These provide basic information. Don't screw this up please!
- non notable contributors are not only uploaders (but also people without account, like a photographer of an institution) + can contain a complex format for attribution. Please don't screw up the attribution!
- description text can contain both external links and internal links, and is needed to be able to point to what is in the picture. The description field on Wikidata does not allow linking. If the Wikidata kind of description field is used, too many descriptions of files loose valuable information.
- "Is the use case best served by attaching a Wiki-style category?" -> Yes! Don't underestimate the value of categories. Categories are for Commons the backbone which a wikibase does not provide. I am positive that a wikibase can improve searching and much more, but the stable grouping a category provides does not exist in wikibase and thus wikibase can't replace this Commons backbone.
- Romaine (talk) 06:05, 18 February 2018 (UTC)[reply]
- Comment First of all, the proposal is a really impressive piece of work and makes very reasonable suggestions. I hope that very much less information about files will be held in MediaWiki categories on Commons in future. I want to second User:Jheald's point about annotations. Annotations can be represented in Wikidata with the "depicts" property and as notes in Commons, and this project is a great opportunity to eliminate that duplication. For this, we would need to be able to tag those Commons images which are representations of artwork with the right framing and proportions. I see source as being a difficult field to capture: the source of an image might be described with free text, might be a Commons user, or might be an institution, book or organisation that is described in Wikidata. Maybe source has to be broken down into specific fields, like "uploader", "source document", "sharing partner organisation". MartinPoulter (talk) 13:27, 20 February 2018 (UTC)[reply]
- Comment looks good.
- Not sure if that is intended, but supposedly we wouldn't have an item "Coyote in Yosemite National Park" as value for "depicts" (and I don't think we should), but "depicts" would use the item for "Coyote" and a maybe a geolocation property for "Yosemite National Park" (this for sample image mentioned below).
- The question is what to do with non-notable people that are identified and depicted.
- If locations have coordinates, I don't think there is a problem to create them in Wikidata, but I don't think we want "next to (some) tree in Yosemite" as location item in Wikidata (we do have items for specific trees that have coordinates).
- If uploaders don't mind, I think items for these photographers could be created in Wikidata, but maybe it's easier to maintain all creators at Commons and reference Wikidata items when needed.
- Not sure what "Textual descriptions that should be multilingual" is meant to cover. Supposedly that could be "next to a tree in Yosemite Park".
- Nice work. Thanks. --Jura1 (talk) 07:39, 23 February 2018 (UTC)[reply]
- Question For example, we want to avoid Wikibase@Commons having to define a Width property because Wikidata already has a perfectly good one (P2049): so the link will be to Property:P2049 or to Q35059? For example, file:Affe_vor_Skelett.jpg now: Height: 61 cm, Width: 44.5 cm. And, by the way, "Dimensions" - no links, "Medium" - no links, "Date" - no links, "Title" - no links, "Artist" - no links. --Fractaler (talk) 08:14, 23 February 2018 (UTC)[reply]
- Comment Notable vs. non-notable images: if we’re looking at something like Mona Lisa, it’s clearly it should have Wikidata item. However, if it’s a picture of a wolf in Yosemite, it probably does not deserve Wikidata item. But there are properties that we may want to have in both cases. We need to figure out how to make it so that finding this information does not turn into a guessing nightmare of "where is it, on Commons or on Wikidata?". Which may involve some information duplication (not sure how to keep it in sync) some more tight integration maybe. E.g. let’s say we want to store authors and what is depicted in both places, and maybe some authority IDs too, or have very smart searches that know how to look in proper places.
- In general, however, Wikidata should have notable (i.e. of general wide world interest) data about notable things (i.e. things that a member of general public will likely to have interest in) while WB@C should have more localized and detailed data, including deep technical data on specific images, sourcing, licensing and other details of specific image, etc. The latter also includes how we acquired the image (GLAM templates, etc.), EXIF,
- For pictures of art, we should be distinguishing between authors of the art piece (which would probably be stored on Wikidata) and makers of the picture itself (which should be on Commons) - but we should be able to look up for both, and be aware that the users may not know what is meant by “author” in specific cases - for notable and non-notable images it may be different. We may want to have this for pictures of non-notable works or otherwise non-covered items on Wikidata too, which means we probably will have to store both kinds of authorship information on WB@C.
- Unstructured fields - some things that happen with works of art are hard to describe in structural way. There are of course MediaWiki parts but those are often inaccessible to tools working with WB@C even for mere display. So it is possible that we may need to have kind of middle ground fields - like fields in infoboxes that usually have dates but occasionally it may be something like “most historians think it’s somewhere between 1310 and 1325”. The danger however is that these often suffer from severe translation under-coverage.
- Laboramus (talk) 23:56, 26 February 2018 (UTC)[reply]
Under this header, I would like to collect specific feedback that concerns file metadata that is especially important for, and sometimes unique to, media files that have been provided/uploaded by GLAM institutions - sometimes, but not always, as part of partnerships with Wikimedia volunteers and affiliates.
To kickstart this discussion: here's a first rough (and probably incomplete) list of 'types of metadata' that (in my opinion) are quite specific and important to GLAM-related media files here on Commons.
GLAM files on Commons often have specific metadata about their source and about the partnership through which the file got uploaded to Commons.
- There's usually an institution template.
- There's often an accession number - the unique number or code under which the media file is known in the GLAM's collection.
- There's often a URL/URI / hyperlink to the place where the file lives in the GLAM's own database
- Some files have a source template that indicates the institution as well, and sometimes says via which partnership (with volunteers, affiliates) the file has been uploaded.
- Some files have a special tracking category; this is often a hidden category that is used to track the usage of the files via statistics tools like GLAMorous or BaGLAMa.
I would like to hear your input on the following questions:
- Is this list of (broader types of) GLAM-specific metadata complete? If not, which elements do we miss?
- Where should each (type of) element live in the above-described framework (i.e. MediaWiki/wikitext / Wikibase@Commons / Wikidata)?
Many thanks for your input! :-) SandraF (WMF) (talk) 10:11, 16 February 2018 (UTC)[reply]
- I like the overview image. For GLAM things the concept of a notable work comes in play. Let's take File:Leopold Rottmann - Das Tennengebirge - 12032 - Bavarian State Painting Collections.jpg. The photo itself is a work and that info should be documented here. The work it's a photo of is at Q30062804 and that information shouldn't be duplicated (but you would probably like to override it). Not every artwork is notable so we should at least cover the fields of {{Artwork}} locally here on Commons. Reminds me that {{Art Photo}} makes a good distinction between the work (probably on Wikidata) and the photo of the work (info here on Commons). Multichill (talk) 10:55, 16 February 2018 (UTC)[reply]
- Speaking about artworks, we still need to be able to indicate that some image is of an artwork in Wikidata, but also that an image can be a cropped part of an artwork (like only showing one person's face from a group of people in a painting, or city gate of full city impression). Romaine (talk) 06:13, 18 February 2018 (UTC)[reply]
- The metadata listed above seems pretty complete to me. To answer the question of what goes where, I used the flowchart because I found it pretty straightforward :
- Speaking about artworks, we still need to be able to indicate that some image is of an artwork in Wikidata, but also that an image can be a cropped part of an artwork (like only showing one person's face from a group of people in a painting, or city gate of full city impression). Romaine (talk) 06:13, 18 February 2018 (UTC)[reply]
- institution template belong in Wikidata
- accession numbers in Wikibase@WikimediaCommons (but what about notable artworks that also have an ID on Wikidata?)
- URL/URI / hyperlink in Wikibase@WikimediaCommons
- source template in Wikibas@WikimediaCommons, because it is descriptive text that should be available in multiple languages
- tracking category I am not so sure, but Wikibase@WikimediaCommons seems most appropriate.
MichellevanLanschot (talk) 12:29, 21 February 2018 (UTC)[reply]
- I agree with most of these, and that tracking categories are especially tricky. It seems like if the Wikidata and Wikibase properties are good enough, it might eliminate the need for tracking categories. If I can label a photograph as being part of a broader collection through Wikidata, then I could make Wikidata queries on it... I guess it depends on the existing structure of how pageviews are tracked. It might be easier to keep it as a Commons category>Rachel Helps (BYU) (talk) 18:20, 21 February 2018 (UTC)[reply]
- You're probably already aware of this, but for items in special collections, the accession number/call number is often more than just an alpha-numeric string. Lots of potential problems there! The cite archive template on enwiki gives a good overview of the kinds of things archivists might want to include in institutional-specific metadata for saying where exactly an item is (maybe a URL would be good enough, if boxes and folders are too complicated). Also, sometimes items are part of a "collection" and I'm not sure if it would be better to have institutional collections be a hidden category or part of Wikidata/Wikibase. Rachel Helps (BYU) (talk) 18:20, 21 February 2018 (UTC)[reply]
- We do have a few collections set up on Wikidata. These get messy fast however, because if you look at e.g. the Andrew W. Mellon collection (Q46596638) this is set up as a subset of the National Gallery of Art, but of course Mellon did a lot of buying and selling before donating to the museum, and some of his stuff went elsewhere. Therefore, there should be an overarching Mellon collection with the NGA part as a separate piece, etc. Jane023 (talk) 09:38, 23 February 2018 (UTC)[reply]
- One thing I haven't seen listed here yet are the differences between what a GLAM donates in terms of a partnership and what volunteers do with this metadata over time. For example, when you look at some of the SoaP lists of paintings from institutions that Maarten and others have created on Wikidata and scroll down, you will notice little blips with images in them for artists or time periods (e.g. impressionists) that individuals have worked on. Digging deeper, these records will overlap with other aggregator records (RKD, Joconde, and others) and the data may be changed. So far we have interpreted these differences ourselves with little outside interest, but this is starting to attract attention, so we need a structured way to capture it: <original upload> vs. <current state of the record>. I believe this has something to do with the whole "signatures" idea that was launched at the WikiCon in March 2017 but I really don't see that idea taking shape in any way that volunteer efforts can contribute to. Jane023 (talk) 09:31, 23 February 2018 (UTC)[reply]
- Comment The list seems to cover the most important GLAM specific elements. Tacking categories are hugely important for providing usage stats (Currently via Baglama2, Glamorous ect). I would suggest that for most institutions using Wikidata properties could easily replace a tracking category as a means of identifying a group of images/media. It may also be worth noting that Institution templates often contain several elements, all of which are important to GLAMS. Ours include:
- logo - GLAMS are keen for their branding to be visible on Commons (This should be hosted on Commons)
- Custom text - inviting users to explore collections further on institution website (Commons)
- General Website Link - (can be hosted in Wikidata)
- image specific Handle/url - (can be hosted in Wikidata)
So it may actually be necessary to break down the different elements of an institution template and host each in the most appropriate place. Best Jason.nlw (talk) 14:35, 26 February 2018 (UTC)[reply]
- Comment I don't have a good sense of the overall available metadata fields, but my first thoughts go to whether there will be adequate ways to represent the context/hierarchies of archival materials. It's ok that the full hierarchy can't be represented, but being able to have collection records in Wikidata and linking to those records would be important. Take the hypothetical collection of Homer Simpson Manuscript Collection. Perhaps Series 1 is correspondence, which contains 10 boxes, about 10 folders per box, and a a few dozen letters per folder. It would be fairly normal practice to make each folder into an object. It would make sense that the collection as a whole would have a Wikidata entry, and for the Commons object we would want to indicate that it a partOf:Homer Simpson Manuscript Collection. Something similar would also be important for serials (e.g., issues of a newspaper), where the serial as a whole would have a Wikidata entry, and each Commons item would be a part of (or possibly an instance of) the serial. I do not think that the existing concept of "categories" would be appropriate here. Retrent (talk)
- Comment In an the archival/library setting, "accession number" usually refers to something different than the unique ID of the object, referring instead to a grouping of material that all arrived together at the same time, and it may not be useful to include anywhere in the Wiki environment. Perhaps a more genericly-applicable term would simply be "local unique identifier" or similar. Others have already suggested that it should go in Wikibase@WikimediaCommons, and that makes sense. Also agree with Rachel Helps (BYU) that the local UID can be quite a messy string in certain institutional contexts. Retrent (talk)
"MediaWiki - the existing base system that relies upon semi-structured and unstructured data in wikitext" - what is 1) semi-structured data, 2) unstructured data? --Fractaler (talk) 11:57, 16 February 2018 (UTC)[reply]
- Hi @Fractaler: semi-structured data is information like the {{institution}} or {{creator}} templates or the various sub-templates of {{Infobox art}} like {{size}} or {{technique}}. These templates capture information that is very specific, and increasingly calling on Wikidata to help define those concepts. These templates, in theory, could be accessed by a parser, to extract the data. However, in practice, because of the changing nature of those templates, and the huge number used on commons, they are really hard to read by computers (as opposed to the kinds of structured data stored in Wikibase). Similarly, a lot of "semi-structured" information is captured in categories right now, but they require an immense amount of human double checking for a computer to use that data. However, there are a number of other kinds of information that is completely unstructured right now: such as free-text descriptions of paintings, or snippets of important pieces of data captured in description fields, or author fields without a creator template: they might be specific descriptions but there is no way for a computer to reasonably extract meaning from it: its just a string. Sadads (talk) 13:29, 16 February 2018 (UTC)[reply]
- Thanks for the detailed explanation. So, in Wikidata we you can create items (Template:Institution, Template:creator, etc): "Template:Institution" is Q2336004 (semi-structured data), "Template:creator" is Q2336004 (semi-structured data), etc? --Fractaler (talk) 13:11, 17 February 2018 (UTC)[reply]
- @Fractaler: The question is not whether these templates have Wikidata items, the question is how they contain their information. The team are saying the information in them is "semi-structured" because although the template does provide a bit of organisation, the information is still quite difficult to extract and examine, particularly in bulk.
- For contrast, compare the wikitext of Institution:Aberdeen Art Gallery (old version) with Institution:Aberdeen Art Gallery (new version) -- diff. In the old version, the information is in a jumble of different formats for each different field, and would be a bit of work to extract and interpret for querying or reuse. In the new version, the information is now all in the Wikidata item Aberdeen Art Gallery (Q4666883), held in a fully structured format that is much easier to query (eg as in this query for art museums in Scotland that opened in the 1800s:
tinyurl.com/yc2fysvu
) and reuse (eg as in this re-presentation in Reasonator). - The structured data project aims to bring this degree of flexibility and reusability and queryability to all the data that can be put into a structured format. Jheald (talk) 16:09, 17 February 2018 (UTC)[reply]
- So, "semi-structured data" is just a Wikidata's item (item with a structured format)? --Fractaler (talk) 16:12, 18 February 2018 (UTC)[reply]
- @Fractaler: no it's not. We seem to be hitting a language barrier here. Maybe someone can explain it in Russian? (user:Ymblanter?). Multichill (talk) 21:11, 20 February 2018 (UTC)[reply]
- Wikidata's language ("Q-language") does not have language barriers. It can be used to explain to readers this page what "semi-structured data" (also "unstructured data", "ontology", etc) is. For example, in introduction: on this page, semi-structured" is a data, "unstructured data" is a (information without a formal data model) data, ontology is a (specification of a conceptualization) conceptual model, "Wikidata item" is a (main documentary unit of Wikidata) Wikidata internal entity, Wikidata entity, Wikibase item, work/creation/work's production. --Fractaler (talk) 06:40, 21 February 2018 (UTC)[reply]
- @Fractaler: no it's not. We seem to be hitting a language barrier here. Maybe someone can explain it in Russian? (user:Ymblanter?). Multichill (talk) 21:11, 20 February 2018 (UTC)[reply]
- So, "semi-structured data" is just a Wikidata's item (item with a structured format)? --Fractaler (talk) 16:12, 18 February 2018 (UTC)[reply]
- Thanks for the detailed explanation. So, in Wikidata we you can create items (Template:Institution, Template:creator, etc): "Template:Institution" is Q2336004 (semi-structured data), "Template:creator" is Q2336004 (semi-structured data), etc? --Fractaler (talk) 13:11, 17 February 2018 (UTC)[reply]
Hello, shouldn't the EXIF metadatas, when they exist, be linked to some Wikidata items? in order to generate automatically corresponding categories, e.g. camera model, aperture, shutter speed, focal length, ISO speed.
Example File:Canis latrans (Yosemite, 2009).jpg could be automatically categorized in Category:Photographs taken on 2009-02-16, Category:F-number f/10, and in other similar camera settings categories. Christian Ferrer (talk) 18:12, 16 February 2018 (UTC)[reply]
- Yes, this kind of data should be stored in a structured format so we can query and search it. I hope when we have that up and running we can deprecate the categories like Category:Photographs taken on 2009-02-16 and Category:F-number f/10. Multichill (talk) 21:36, 16 February 2018 (UTC)[reply]
Tag namespace? or better "Category all images" namespace
[edit]Discussion moved over to the main talk page
Wikibase database physically close to mediawiki database and other ideas
[edit]Discussion moved over to the main talk page
like the decision gates, but was hoping for some tools to build links between data spaces, and review and edit data ontology. there seems to be a lot of hand work with categories. would prefer more like mix and match, and listerias. i.e. for creator template, we manually add Q number to red link, but must also add creator to wikidata. Psyduck3 (talk) 00:40, 18 February 2018 (UTC)[reply]
- Tools will be built to migrate/replicate appropriate data, yes. Keegan (WMF) (talk) 18:09, 20 February 2018 (UTC)[reply]
- I expect the structured data team here to take the same approach as the Wikidata team: Provide the software and the community will figure out the bots and tools to migrate data. In my opinion that worked very well on Wikidata. We already have loads of tools that work with Wikibase (the database software underpinning Wikidata and also structured data) so probably quite a few of these tools can be updated to work here too. Multichill (talk) 14:48, 24 February 2018 (UTC)[reply]
- Yes I agree with Multichill on this. Just as with the introduction of Commons not all files were moved right away (and some are still on the local Wikipedia projects and have never been moved). I am hoping that adding multilingualism to the mix will draw in more Commons contributors who will help refine and expand Commons categories like Category:Portraits with table carpets, creating new ontologies from aggregated categories like Category:Paintings of carpets that will introduce new categories such as Category:Still-life paintings with table carpets. Just as we do lots of labelling, merging and renaming of items on Wikidata over time, I expect we will be renaming and merging many categories after this is implemented. The biggest advantage is the multilingual aspect that enables wider participation by art lovers everywhere and not just English-speaking ones. Jane023 (talk) 09:18, 25 February 2018 (UTC)[reply]
- @Jane023: There's an issue with talking about categories for such things. My understanding is that the structured data system will run on "topic tags" (ie Q-numbers) for things depicted. So one may be able to search for (i) paintings, (ii) of genre still-life, that (iii) depict table carpets. But whether that will actually lead to a curated page with text for such things (somewhere to annotate "a surprisingly common trope in the 17th century"?) in the way that category pages can now, I think is dubious.
- Indeed, part of the pitch from the team to uploaders, is that newly-uploaded images would no longer be expected to be categorised at all, just have the topic-tags added. So what categories we do currently have, as categories, would increasingly no longer include the most-recently uploaded best-curated images.
- I have to say, I'm more than a little apprehensive about this. I think part of what attracts people to categorisation is a sense of working on a structure that feels like it has some kind of tangible solidity, and can be curated -- so that you can look back and see the structure you have built and populated and refined, as see that it is good. I am not sure that topic-tagging will bring Commons contributors nearly the same sense of constructing and presenting something that they have built.
- The other thing I worry about is that at the moment the category system is something we know very well, and we are very familiar with its weaknesses and deficiencies. The structured data system is being presented as something shiny and new that will do make those deficiencies a thing of the past. The promise is that everything will be wonderful, we just have to step into the future. But inevitably it will have its own compromises and deficiencies -- tasks that it will be clumsy at or struggle with. In particular I wonder if there may be groupings and binnings-together in the category system (including "none of the above" at various levels of hierarchy) that may make sense and help discovery, but wouldn't necessarily be suggested by any faceted-search suggester system, or appear as topic tags in their own right.
- The new system is an unknown quantity, for all that we might hope it will promise. It worries me that there seems to be a groupthink, that because of that promise, any further support or development or maintenance or even continued use of the category system is now unnecessary. (With decisions like not to support the category system with CommonsData items being one early visible symptom of this). I fear that is the position, but it seems reckless, before we have come to understand what we may value in categories that faceted search may either not provide or not provide as efficiently. And it also seems to close out opportunities in which, if both were given the chance to develop, structured data could actually help support the category system. We're already seeing categories starting to gain new multilingual Wikidata-driven infoboxes. With structured data for categories, that would also make it much easier to store and show multilingual subtitles for category names or multilingual generated auto-descriptions; as well as (via stored machine-interpretable statements of what categories represent) categorisation of existing categories that was more comprehensive; better machine-assisted categorisation for media; and a better understanding of what new topic-tags existing categories should represent.
- But the groupthink seems to be against any of that, instead holding that categories are the past and have no distinctive worth in the structured data future. I do fear that if we go down that road, we will not realise what we are losing until we have already done it significant damage.
- Slightly off-topic for this discussion, but it was on my mind. At the moment it doesn't seem appropriate to talk about structured data in terms of restimulated new work on categories. Jheald (talk) 20:34, 26 February 2018 (UTC)[reply]
- You seem to think that Commons categories will be replaced. This will not happen. Edits to categories will remain possible in the same way they take place today. Jane023 (talk) 21:06, 26 February 2018 (UTC)[reply]
- Yes, edits to categories will still be possible. But will GLAMs uploading files still be encouraged to categorise their uploads? When tags are added will categorisation follow? Will editors do the same work twice? Will maintaining and improving categories really still have the same emphasis? Jheald (talk) 22:45, 26 February 2018 (UTC)[reply]
- Yes. And we will also still have mono-project contributors (i.e. Commonists who never go to Wikidata and the other way around). Jane023 (talk) 09:54, 28 February 2018 (UTC)[reply]
- Yes, edits to categories will still be possible. But will GLAMs uploading files still be encouraged to categorise their uploads? When tags are added will categorisation follow? Will editors do the same work twice? Will maintaining and improving categories really still have the same emphasis? Jheald (talk) 22:45, 26 February 2018 (UTC)[reply]
- I think part of what attracts people to categorisation is a sense of working on a structure that feels like it has some kind of tangible solidity, and can be curated -- so that you can look back and see the structure you have built and populated and refined, as see that it is good: Commons' categorisation ({{Category tree all}}, mode=parents) = Wikidata's "classification" (Template:Item documentation) = taxonomy. Another question, whose model of the world (taxonomy) is more accurate (by Commons, by Wikidata, by other models of the world). And evidence of own rightness, someone uses scientific facts, and someone uses administrative rights --Fractaler (talk) 12:31, 28 February 2018 (UTC)[reply]
- You seem to think that Commons categories will be replaced. This will not happen. Edits to categories will remain possible in the same way they take place today. Jane023 (talk) 21:06, 26 February 2018 (UTC)[reply]
- Yes I agree with Multichill on this. Just as with the introduction of Commons not all files were moved right away (and some are still on the local Wikipedia projects and have never been moved). I am hoping that adding multilingualism to the mix will draw in more Commons contributors who will help refine and expand Commons categories like Category:Portraits with table carpets, creating new ontologies from aggregated categories like Category:Paintings of carpets that will introduce new categories such as Category:Still-life paintings with table carpets. Just as we do lots of labelling, merging and renaming of items on Wikidata over time, I expect we will be renaming and merging many categories after this is implemented. The biggest advantage is the multilingual aspect that enables wider participation by art lovers everywhere and not just English-speaking ones. Jane023 (talk) 09:18, 25 February 2018 (UTC)[reply]
- I expect the structured data team here to take the same approach as the Wikidata team: Provide the software and the community will figure out the bots and tools to migrate data. In my opinion that worked very well on Wikidata. We already have loads of tools that work with Wikibase (the database software underpinning Wikidata and also structured data) so probably quite a few of these tools can be updated to work here too. Multichill (talk) 14:48, 24 February 2018 (UTC)[reply]
Hi, It is not possible to divide Commons content creators into notable and non-notable. There are a gazillon of middle possibilities, and it will create unless conflicts to decide for each case. We need a solution good for all. Regards, Yann (talk) 13:17, 18 February 2018 (UTC)[reply]
- This is indeed something to think about. I can imagine that someone created an item for an author on Wikidata, after that an image is uploaded with as creator/author the one linked on Wikidata, and if then on Wikidata that author is deleted for being non-notable, the author should still be shown on the Commons page. Romaine (talk) 14:22, 18 February 2018 (UTC)[reply]
- We already have d:Wikidata:Notability and that's fine. When this discussion talks about the concept of notability, it is not discussing whether or not something or someone is notable - those discussions have generally already taken place and will always belong in the projects where they will stay. Jane023 (talk) 09:43, 23 February 2018 (UTC)[reply]
- @Jane023: d:Wikidata:Notability is probably overdue some changes. Back in November last year I suggested at WD:Project Chat that items should be permitted for entities with Commons categories, "if they relate to a distinct identifiable conceptual or material entity. Items should not be created for Commons categories that can be described as an intersection of existing entities." That discussion got archived off the page without a formal closure; but should probably be revived as a formal RfC, especially if we are now creating infoboxes on Commons categories that rely on a consistent Wikidata link, need as a matter of urgency to better understand what our categories here relate at Wikidata, and are proposing to use Wikidata items to describe the content of images here. (Indeed, the needs of the latter may suggest even wider broadenings of suitability for a Wikidata item). So the current lines drawn by WD:N may not be set in stone. But I agree, that those discussions will continue to take place in the different projects.
- Romaine raises a relevant point though. We've often enough seen images transferred up to Commons from Wikipedias, then deleted here for whatever reason, without any thought of restoring them to the Wikipedias from where they came (where they may still be appropriate). It is entirely conceivable to imagine the same happening for some creator or subject item -- getting transferred up to Wikidata, but then deleted there. If such cases occur, then there has to be better communication between the projects, to make sure that information, if still appropriate here, is not lost. Jheald (talk) 16:14, 24 February 2018 (UTC)[reply]
- I don't see the danger in that. People who satisfy Wikidata notability will be created (or have already been created) according to those rules, and on Commons it is up to the community to create creator templates or not. The liklihood of having creators on Wikidata is just much higher simply because Wikidata has no copyright restrictions. Copyright restrictions are the biggest reason people are missing from Commons: "out of sight, out of mind". If at some point a creator is deleted from Wikidata, there is no need for the person to be removed from Commons, and theoretically the creator template will still work (unless it was created after the item was created). Jane023 (talk) 20:07, 24 February 2018 (UTC)[reply]
- The latter I think was the case Romaine was thinking about -- the case if all the information about the person or the item or the place or the thing depicted is stored on Wikidata and there never was any separate information here, a situation that may be increasingly likely as the structured data model becomes the norm. Such an item shouldn't be deleted from Wikidata (probably). But if the community there do decide they don't want it, then there must be procedures to be able to smoothly transfer the information to here, if it is being used, so it is not lost. Such procedures will need to work rather better than they do at the moment for Wikipedias when eg Commons decides that it wants to delete an image... Jheald (talk) 18:37, 26 February 2018 (UTC)[reply]
- Well edge cases on projects will always exist and there will be edit wars. I can imagine that Wikidata will not be thrilled at a mass-import of say, pornographic hobby photographers with large portfolios on Commons. Jane023 (talk) 09:59, 28 February 2018 (UTC)[reply]
- Sorry, saying that there are always edge cases is not good enough. It is on Commons a hard requirement that the author is added on the file page. We need to think about a solution for those cases when an author is inserted from Wikidata and later gets deleted on Wikidata. Better safe than sorry. Otherwise uploaders get the troubles, including deletion of images because of this and demotivation. That should not happen. Period. Romaine (talk) 13:44, 10 March 2018 (UTC)[reply]
- This is a misunderstanding. Authors on Commons in the sense of uploaders is an entirely different matter than creators on Commons in the sense of "has a creator template". It will always be possible to have uploaders on Commons, but these people will generally not have cretor templates on Commons or indeed have items on Wikidata. Jane023 (talk) 18:00, 16 March 2018 (UTC)[reply]
- Sorry, saying that there are always edge cases is not good enough. It is on Commons a hard requirement that the author is added on the file page. We need to think about a solution for those cases when an author is inserted from Wikidata and later gets deleted on Wikidata. Better safe than sorry. Otherwise uploaders get the troubles, including deletion of images because of this and demotivation. That should not happen. Period. Romaine (talk) 13:44, 10 March 2018 (UTC)[reply]
- Well edge cases on projects will always exist and there will be edit wars. I can imagine that Wikidata will not be thrilled at a mass-import of say, pornographic hobby photographers with large portfolios on Commons. Jane023 (talk) 09:59, 28 February 2018 (UTC)[reply]
- See discussion now on VP: Commons:Village pump#Creator templates for Commons contributors. Regards, Yann (talk) 18:08, 16 March 2018 (UTC)[reply]
@RIsler (WMF) and Keegan (WMF): As an important adjunct to this discussion about what data is going to live where, it would be useful to clarify what data we are expecting to be viewed where.
In particular, to what extent will the data stored in the Commons wikibase be presented directly to readers, and to what extent will it be mediated by templates?
As another way of looking at this, when one goes to the page for an image, will it generally appear much as it does at the moment, with data about the image, its licensing, etc, being presented through templates, perhaps with the Commons Wikibase screen as a linked additional screen, or additional tab?
Or is intended that the Commons wikibase data will be presented directly by the software, on the primary information screen?
The template-based approach may seem an additional layer of clunkiness, but importantly it allows the community to completely determine how information is presented, and tweak or modify that as situations demand. (eg, even down to what text is presented when a particular license applies). On the other hand, if information is presented directly by the software, that may give the chance to be more efficient, and to implement a consistent design update for the site. But it also means that that design and information-presentation would no longer something the community could extend or modify or adjust, in the way that templates can currently be adapted.
In thinking about the lines between wikitext and CommonsData, it would be useful if you could flesh this out a bit more. (Including also, for example, whether it be possible to override a CommonsData value with local wikitext?) Jheald (talk) 18:39, 23 February 2018 (UTC)[reply]
- @Jheald: We are actually working on this right now and hope to have UI wireframes (and possibly a prototype) for the community to view by end of March/early April. The current thinking is to move away from templates for presentation and have the Wikibase software handle it, for many of the reasons you mentioned plus some additional reasons. However, that doesn't necessarily mean we'll entirely do away with templates. We're looking at ways that we can still keep some template items in a Wikitext-only "view" (which may be possible with the upcoming Multi-Content Revision features). RIsler (WMF) (talk) 19:40, 23 February 2018 (UTC)[reply]
- @RIsler (WMF): My concern is that that becomes very limiting in what additional information can be shown. The Wikibase software makes it easy to add a property or qualifier as and when the need for it becomes apparent. But unless the display is mediated by templates, there will be no way to properly integrate that statement in what is shown. Jheald (talk) 21:12, 23 February 2018 (UTC)[reply]
- @Jheald: We totally understand the need to present information on Commons in a different way than it is designed on Wikidata, for example. When we release the forthcoming UI design updates, you will see how we use the Wikibase system on the back-end but dress it up and rearrange it on the front-end so we can still have a good viewing experience. RIsler (WMF) (talk) 22:16, 23 February 2018 (UTC)[reply]
- @RIsler (WMF): Yes, but does that front-end system have the flexibility for individual Commons users to modify or adapt it? That is what I am asking. Jheald (talk) 22:22, 23 February 2018 (UTC)[reply]
- @Jheald: Modification to some degree may be possible (we're still sorting this out), but the ultimate goal here is consistent structure. There could possibly be some wiggle room within that structure, but the basic structure of all the file pages will be the same. Without that, many of our efforts in this structured data project won't have the desired impact. RIsler (WMF) (talk) 22:36, 23 February 2018 (UTC)[reply]
- @RIsler (WMF): So what happens when additional properties get added to the CommonsData wikibase? -- something GLAMs have identified as an essential required feature to be able to accommodate specialist or unforeseen aspects of their data.
- With the existing template model it's easy -- a template can be created or adjusted to draw directly on that statement, or use it to present information in a way conditional on its value.
- Will the new system have similar flexibility? Jheald (talk) 22:45, 23 February 2018 (UTC)[reply]
- Or: what happens when the Commons community decides to adjust the language it uses to describe the copyright situation for a particular class of images? (Something that has happened several times in the past)
- With a template it's easy -- one just edits the language of the templated message, and it's done. But will the new system still have that adaptability? Jheald (talk) 22:50, 23 February 2018 (UTC)[reply]
- @Jheald: These are great questions that do not have answers to them yet, and ones I'd love to be able to answer but development simply isn't there yet. Making this happen is a staged process and iterative within stages; so at this point we're talking about what goes where, soon we'll move on to how is it then displayed, and edited, and the subsequent questions. With each potential answer to the question we'll have the chance to try out different possible solutions - the iteration - until we find the solutions that work best for Commons curators and contributors, as well as the reusers and everyone in between. It might be helpful for us if we document your questions on the main talk page, so that we can work on answering them with time as the process goes forward. Would that be useful for you as well? Keegan (WMF) (talk) 23:12, 23 February 2018 (UTC)[reply]
- @Jheald: I'll +1 Keegan's suggestion about moving this to the talk page (same for future discussions that are tangential to the specific topic at hand). But I do want to give you the best answer I can based on where we are now. I'll address both scenarios you mentioned. a.) We're looking at the concept of "models" - which are essentially classes that a MediaInfo item (MXXXX) can be added to. The model defines the default statements attached to all M items within the model. If a new property gets added (via a process which we're still defining), you just update the model with a statement containing that property and all M items that have that model get updated. b.) Changing the language of a license could work like this: We could have a Wikibase Property called "Commons Licensed-PD text" or something similar, and it would have multilingual descriptions the community could edit as necessary. That's about as specific as I can get since we are still in the process of discovering the right solution. Although we're in the very preliminary stages, I'm happy to detail that discovery process for you as I did above, but as Keegan said we really should take that to the main talk page and not in this Ontology topic. RIsler (WMF) (talk) 00:37, 24 February 2018 (UTC)[reply]
- @RIsler (WMF): With respect, I think this discussion is relevant to this consultation, so I would be grateful if would indulge it a bit longer. The purpose of this consultation is to consider where particular information should be located, and a chance for the community to flag up any issues coming out of that, that may need particular consideration.
- In that light, consider eg the template {{PD-Art}} currently used on just under a million pages, for example this one, an image of the painting described at d:Q19801102. This template does a number of things. It gives the current overall copyright status of the image, for which we have a property copyright license (P275) that will take for its value an item on Wikidata. In general, the copyright status of an image may not be the same as the copyright status of the painting -- for example, if an image included part of the frame of a painting that would make that element copyrightable as an image of a 3d work, and so even though the painting was PD, one would need to consider the license under which the photographer had released their work. So in general this will need to be a statement on the CommonsData wikibase, although one with a value that will point to a standardised item on Wikidata. Secondly, the template indicates the reason that the uploader thinks it has this copyright status (ie because it's a faithful reproducion of an old 2D artwork). There is not currently a property for this, but there is a current discussion in the paintings project on Wikidata as to whether there should be. But it seems likely that this too will have a property, with values pointing to some standard items on Wikidata. It may well be that different reasoning will apply to different aspects of, or contributions towards, the final image.
- But the template does more than this. It also controls the message that is given about the copyright reasoning, and how that message is displayed -- in particular, the statement that this reflects a position taken by the WMF as to what the law should be; but, unless you are in the USA, the law of your country may not agree, or may be uncertain or untested. With this the template gives a link to a Commons guidance page discussing the issue in more detail.
- The wording of this is quite sensitive, and has been revised more than once. (The legal summaries in the more 'niche' Commons copyright templates do tend to get revised from time to time, as Commons gains more nuanced understanding of a particular legal situation, or new judgments or laws modify the situation, or a clearer or more precise wording is suggested). Furthermore, to retain the protections in the United States of Section 230 and the DMCA, and their equivalents in Europe, it may be quite important that their wordings do come from the community and not from the Foundation -- ie that they are not hard-written into code.
- This consultation is about where information will live. The introduction says that templates will still live as wikitext on Commons MediaWiki pages. But which of those templates will still have a role? Which will still be meaningfully usable? And if they are not usable, no longer part of the process, what happens to the information that they contain -- information about format and presentation of information, and information about specific messages to convey. This seems to me a very relevant issue for the community to consider.
- What you suggest is that the wording of the message can be contained as a multilingual statement on the relevant Wikidata item for the copyright reason. One can imagine there might also be a statement there holding a link to a particular guidance page on Commons. (No property for that currently exists on Wikidata, but it could be created).
- The question is: is this enough? Messages about the reasoning behind copyright status assertions, and their implications for re-use, are absolutely fundamental to what Commons is about. Is it okay for the text of these messages to be held on a 'foreign' system, run by a different community, not Commons admins and the Commons community? Is it sufficient just to store a message (mark-up free) and optionally a guidance link, to be presented in a standard way, or is it important for the community to retain more detailed control, at a message-by-message level, and including the scope for immediate ad-hoc revision, of how such messages are formatted and presented? I suggest that these are fundamental questions that the community needs to consider, in any discussion of what information is going to be held, and where. Jheald (talk) 13:10, 24 February 2018 (UTC)[reply]
- (Separate response re: content information flexibility to follow). Jheald (talk) 13:14, 24 February 2018 (UTC)[reply]
- @Jheald: I'll +1 Keegan's suggestion about moving this to the talk page (same for future discussions that are tangential to the specific topic at hand). But I do want to give you the best answer I can based on where we are now. I'll address both scenarios you mentioned. a.) We're looking at the concept of "models" - which are essentially classes that a MediaInfo item (MXXXX) can be added to. The model defines the default statements attached to all M items within the model. If a new property gets added (via a process which we're still defining), you just update the model with a statement containing that property and all M items that have that model get updated. b.) Changing the language of a license could work like this: We could have a Wikibase Property called "Commons Licensed-PD text" or something similar, and it would have multilingual descriptions the community could edit as necessary. That's about as specific as I can get since we are still in the process of discovering the right solution. Although we're in the very preliminary stages, I'm happy to detail that discovery process for you as I did above, but as Keegan said we really should take that to the main talk page and not in this Ontology topic. RIsler (WMF) (talk) 00:37, 24 February 2018 (UTC)[reply]
- @Keegan (WMF): I suppose so. But it's hard to give meaningful feedback or sign-off on a question like "Is it okay to move information from wikitext to Commons wikibase", without some indication of what it will or won't be possible to do with it once it's there. One's understanding of the latter inevitably colours any meaningful position on the former. Jheald (talk) 23:21, 23 February 2018 (UTC)[reply]
- I certainly understand that. This discussion has the operating assumption that information can be changed/manipulated somehow and to some extent (whether fully or less-than), but neither of those is defined. I'll work to make assumptions more clear in the text in the future. Keegan (WMF) (talk) 23:42, 23 February 2018 (UTC)[reply]
- @Jheald: These are great questions that do not have answers to them yet, and ones I'd love to be able to answer but development simply isn't there yet. Making this happen is a staged process and iterative within stages; so at this point we're talking about what goes where, soon we'll move on to how is it then displayed, and edited, and the subsequent questions. With each potential answer to the question we'll have the chance to try out different possible solutions - the iteration - until we find the solutions that work best for Commons curators and contributors, as well as the reusers and everyone in between. It might be helpful for us if we document your questions on the main talk page, so that we can work on answering them with time as the process goes forward. Would that be useful for you as well? Keegan (WMF) (talk) 23:12, 23 February 2018 (UTC)[reply]
- @Jheald: Modification to some degree may be possible (we're still sorting this out), but the ultimate goal here is consistent structure. There could possibly be some wiggle room within that structure, but the basic structure of all the file pages will be the same. Without that, many of our efforts in this structured data project won't have the desired impact. RIsler (WMF) (talk) 22:36, 23 February 2018 (UTC)[reply]
- @RIsler (WMF): Yes, but does that front-end system have the flexibility for individual Commons users to modify or adapt it? That is what I am asking. Jheald (talk) 22:22, 23 February 2018 (UTC)[reply]
- @Jheald: We totally understand the need to present information on Commons in a different way than it is designed on Wikidata, for example. When we release the forthcoming UI design updates, you will see how we use the Wikibase system on the back-end but dress it up and rearrange it on the front-end so we can still have a good viewing experience. RIsler (WMF) (talk) 22:16, 23 February 2018 (UTC)[reply]
- @RIsler (WMF): My concern is that that becomes very limiting in what additional information can be shown. The Wikibase software makes it easy to add a property or qualifier as and when the need for it becomes apparent. But unless the display is mediated by templates, there will be no way to properly integrate that statement in what is shown. Jheald (talk) 21:12, 23 February 2018 (UTC)[reply]
- @Jheald: We are actually working on this right now and hope to have UI wireframes (and possibly a prototype) for the community to view by end of March/early April. The current thinking is to move away from templates for presentation and have the Wikibase software handle it, for many of the reasons you mentioned plus some additional reasons. However, that doesn't necessarily mean we'll entirely do away with templates. We're looking at ways that we can still keep some template items in a Wikitext-only "view" (which may be possible with the upcoming Multi-Content Revision features). RIsler (WMF) (talk) 19:40, 23 February 2018 (UTC)[reply]
- Current file pages are data+presentation in one big toxic mix. With structured data the data will be separated out and on the File pages we can focus on the presentation of this data. In a way we like. This is already happening right now with {{Creator}} and {{Institution}}.
- This discussion page is about the ontology, so how to structure the data. How to represent this data to (re)user and editor is very important, but we should take that as a separate discussion later.
- Not sure if it's mentioned here already, but in previous discussions about this we came up with the concept of a work. A work is something someone made and probably has a copyright status attached to it. We have easy cases with just one work (I take a photo of a landscape) and more difficult cases where we have two works: The photo (work A), the work in the photo (work B) and the relation between the works. Work A always lives on Commons, work B might be on Commons or on Wikidata. Work A has a copyright status and work B too. To make it a bit less abstract:
- Work A: photo taken by the museum and {{Cc-zero}}
- Work B: item on Wikidata
- The relation: "a faithful photographic reproduction of an original two-dimensional work of art" .
- If work B is a 3D work (like a statue) it should be in the public domain and work A should be freely licensed. In some cases work B is in copyright, but the country has freedom of panorama so we can model the relation between the two like that.
- Anyway, I expect that is the direction we'll take because it's probably the easiest and cleanest way to model and it's actually quite close to how we model right now already. We will need a bunch of new properties to model the relation between work A and B. Multichill (talk) 14:43, 24 February 2018 (UTC)[reply]
- @Multichill: You just wrote: "on the File pages we can focus on the presentation of this data. In a way we like. This is already happening right now with {{Creator}} and {{Institution}}." I hope that that is correct -- ie that presentation will still be template-mediated (via some kind of template anyway) in a way that allows control and flexibility. It's probably also the only sane and manageable way to transition from where we are at the moment, when data on Commons wikibase will initially be quite patchy. But it's not the answer I got from RIsler (WMF) above: The current thinking is to move away from templates for presentation and have the Wikibase software handle it. If we're thinking about what information lives where, and in particular the messages on templates like {{PD-Art}}, it's an important consideration. Jheald (talk) 15:03, 24 February 2018 (UTC)[reply]
- That remark made me laugh. It's like saying, let's do away with MediaWiki and let's have MySQL (MariaDB) software handle it. I automatically interpreted it as: Let's build an interface that only uses the data in Wikibase and shows it in a pretty way. Probably just another iteration of the MultimediaViewer? Multichill (talk) 15:18, 24 February 2018 (UTC)[reply]
- Well... yes. But isn't that just what it makes it sound as if is being contemplated? The first iterations of MV sent this community ballistic, until a much clearer link was added to the file pages that the community curate and control. Jheald (talk) 15:37, 24 February 2018 (UTC)[reply]
- To clarify, I'll refer to two things Multichill said above: 1.) "Current file pages are data+presentation in one big toxic mix" - which is absolutely true and is one big issue we're trying to fix. That fixing process will mean changing at least some (maybe even a lot) of the templating system as it is used today. What exactly will those changes be? We haven't gotten to that yet, so there's no need to worry about us making some sinister plot. We're just gaming out ideas right now, and "moving away from templates for presentation" as I said earlier really means continuing to move away from templates that mishmash data and presentation. Which leads me to 2.) Multichill's interpretation of my admittedly inelegantly-phrased previous statement as: "Let's build an interface that only uses the data in Wikibase and shows it in a pretty way": This is the overall, thousand-foot view of one long-term goal (one of many). But the sticky point there is how does that "pretty way" work? How does that work on Commons, and other elements that go across wikis? Multichill mentioned Multimedia Viewer, and we do plan to revamp that later, but what will that look like? Although we are narrowing down solutions from some early work, for the most part we really don't yet know the specific long-term presentation solutions. We know the problems, and those include things mentioned previously, but also other issues like how do we improve the multimedia experience on mobile web and mobile apps? These are just a few rabbit-holes that are part of a big overall conversation that we (both WMF and Community) need to have and it's really hard to combine that with this ontology discussion, which is largely about keeping the data layer separate from the presentation layer anyway (which is important because if we can't do that we may find ourselves right back where we started). With all that said, Jheald raises good questions about whether plain text data in Wikibase is "enough", and my suggestion for that discussion is to flip the question and ask; If we agree on the basic goals of the SDoC project, and we want data to be very structured and very separate from presentation, what steps are needed to make sure the presentation layer (whatever it may be) gives us what we need? And I am willing to talk about that as much as anyone wants, separately :) RIsler (WMF) (talk) 20:34, 24 February 2018 (UTC)[reply]
- Well... yes. But isn't that just what it makes it sound as if is being contemplated? The first iterations of MV sent this community ballistic, until a much clearer link was added to the file pages that the community curate and control. Jheald (talk) 15:37, 24 February 2018 (UTC)[reply]
- That remark made me laugh. It's like saying, let's do away with MediaWiki and let's have MySQL (MariaDB) software handle it. I automatically interpreted it as: Let's build an interface that only uses the data in Wikibase and shows it in a pretty way. Probably just another iteration of the MultimediaViewer? Multichill (talk) 15:18, 24 February 2018 (UTC)[reply]
- @Multichill: You just wrote: "on the File pages we can focus on the presentation of this data. In a way we like. This is already happening right now with {{Creator}} and {{Institution}}." I hope that that is correct -- ie that presentation will still be template-mediated (via some kind of template anyway) in a way that allows control and flexibility. It's probably also the only sane and manageable way to transition from where we are at the moment, when data on Commons wikibase will initially be quite patchy. But it's not the answer I got from RIsler (WMF) above: The current thinking is to move away from templates for presentation and have the Wikibase software handle it. If we're thinking about what information lives where, and in particular the messages on templates like {{PD-Art}}, it's an important consideration. Jheald (talk) 15:03, 24 February 2018 (UTC)[reply]
Relationship between license templates and things depicted
[edit]- Comment If the Thing "depicted" has a copyright status, then one of its properties should be linked one way or another to the corresponding license template in Wikimedia Commons, in the extend that the Commons community (example administrators, curators...) can keep total power on which license is assigned to which thing. Or better, one of its properties should be linked to a specific template (why not a customized namespace, a bit as for {{Creator}}) which include a license template and also infos about the creator and/or about the copyright holder (when differents), OTRS permission, ect..., in this way the copyright status that we attribute remains under the full control of the knowledgeable members and experts of the Commons community. Christian Ferrer (talk) 07:42, 25 February 2018 (UTC)[reply]
- Another way would be to consider at the same level media files and copyrighted objects. Example with multiple potential copyrights, the number of potential copyrights should not be a limitation in the system. Everything that can have a copyright statut (therefore a license tag) should have an ID in Wikibase, and then we can say "is it a media file"? yes or no (Wikidata) or "is it a thing included in a media file"? yes or no (Wikidata). Christian Ferrer (talk) 08:17, 25 February 2018 (UTC)[reply]
- I'm not so sure to still understand my own logic above. I think I will pass :). Christian Ferrer (talk) 19:47, 26 February 2018 (UTC)[reply]
I see that File resolution is listed as something that will continued to be handled by the MediaWiki software, and therefore will not be in the CommonsData wikibase.
Is it possible to confirm, though, that information about the resolution of the file will be accessible to the SPARQL query system -- so that e.g. one will be able to write queries for files of greater than a particular size that depict particular kinds of objects, in particular kinds of places; or eg files of a particular aspect ratio?
It would probably also be useful to be able to access the presence of templates from the SPARQL system -- eg to produce a current list of files of a particular place, or on a particular topic, that did or didn't have a particular template on them. Jheald (talk) 21:36, 23 February 2018 (UTC)[reply]
- @Jheald: We're still working on the What Goes Where aspect (which is why we're having this ontology chat with the Community). Having resolution continue in its current state is our initial plan, but it could change. Regardless of where it ends up, we do plan to have resolution to be searchable in some way, but I can't say for sure at this time whether that will be via SPARQL. RIsler (WMF) (talk) 22:22, 23 February 2018 (UTC)[reply]
- @RIsler (WMF): Please do think about flexibility and adaptability, rather than canned systems. SPARQL has delivered that, spectacularly. But it's a real limitation that there are some silos of information that aren't accessible.
- The proper solution, it has always seemed to me, is to put a SPARQL wrapper around the MediaWiki SQL tables -- all of them -- so that they could then be accessed from within a SPARQL query as an external federated service. Sparqlify is one project seeking to create machinery for doing this; but there are others I believe that are more advanced. Indeed, I believe the first generation of SPARQL services took essentially this approach -- a thin wrapper around an existing SQL set-up.
- Added: Here's W3C's page for a number of projects towards this kind of capability, as of 2012. Jheald (talk) 23:24, 23 February 2018 (UTC)[reply]
- But whatever approach you go for, please do put hooks in to make this kind of information available -- and available for general queries, not just through the end-customer shiny user-interface. Jheald (talk) 22:38, 23 February 2018 (UTC)[reply]
- I already mentioned it several topics up: We should be able to extract all sorts of data from files (exif and other data) and store this in statements. Of course we'll have SPARQL. Without SPARQL this is a completely useless project to start with. Multichill (talk) 14:25, 24 February 2018 (UTC)[reply]
- Its already possible to call MediaWiki API from the Wikidata Query Service (doc). There have also been a prototype of a SPARQL wrapper of the MW SQL database but it's now dead (doc). Tpt (talk) 21:39, 26 February 2018 (UTC)[reply]
- @Tpt: MW2SPARQL looked really promising -- more flexible and a lot more natural and intuitve that the API service. What went wrong? Jheald (talk) 22:13, 26 February 2018 (UTC)[reply]
- To make it work properly I had to have a table for the namespace id <> namespace name mapping in order to make the SPARQL <> SQL mapping definition not too painful. But, at my knowledge, it is not possible anymore with the new database replicas on Tools labs. There are possible workarounds but I have not taken time to implement one. this translation system has also big limitations: aggregates and recusions (like wdt:P279*) are not working yet. The source code is here if you want to improve it. Tpt (talk) 11:00, 27 February 2018 (UTC)[reply]
- @Tpt: That is so disappointing, because what you had already achieved looked so good, and would have made accessing these system SQL tables so intuitive for people used to SPARQL. I am not so upset about the recursions etc -- yes maybe aggregates would be nice, and maybe it would be nice to look all the way up a category tree; but other gateways can be made for the latter. But just being able to do SPARQL 1.0 lookups and joins would already be so valuable, and such a brilliant extension to WDQS. If it's just a question of needing the namespace id <> namespace name mapping in a table accessible from Tools, surely somebody on the staff side could put in a request for that or implement that to get this up and running again? @Smalyshev (WMF): ? Jheald (talk) 12:24, 27 February 2018 (UTC)[reply]
- I agree that what Tpt has done looked pretty nice, but I'm not sure I could help here - I don't know much about how the gateway worked. It would be great if somebody picked it up but I won't probably have any time for it soon. I think there could be a solution found for ns/namespace mapping, maybe filing Phab ticket with exact description of what is missing and how it had worked before would help. Smalyshev (WMF) (talk) 21:46, 27 February 2018 (UTC)[reply]
- Thank you for your reply! I have created a task about it phabricator:T188506. Tpt (talk) 13:34, 28 February 2018 (UTC)[reply]
- I agree that what Tpt has done looked pretty nice, but I'm not sure I could help here - I don't know much about how the gateway worked. It would be great if somebody picked it up but I won't probably have any time for it soon. I think there could be a solution found for ns/namespace mapping, maybe filing Phab ticket with exact description of what is missing and how it had worked before would help. Smalyshev (WMF) (talk) 21:46, 27 February 2018 (UTC)[reply]
- @Tpt: That is so disappointing, because what you had already achieved looked so good, and would have made accessing these system SQL tables so intuitive for people used to SPARQL. I am not so upset about the recursions etc -- yes maybe aggregates would be nice, and maybe it would be nice to look all the way up a category tree; but other gateways can be made for the latter. But just being able to do SPARQL 1.0 lookups and joins would already be so valuable, and such a brilliant extension to WDQS. If it's just a question of needing the namespace id <> namespace name mapping in a table accessible from Tools, surely somebody on the staff side could put in a request for that or implement that to get this up and running again? @Smalyshev (WMF): ? Jheald (talk) 12:24, 27 February 2018 (UTC)[reply]
- To make it work properly I had to have a table for the namespace id <> namespace name mapping in order to make the SPARQL <> SQL mapping definition not too painful. But, at my knowledge, it is not possible anymore with the new database replicas on Tools labs. There are possible workarounds but I have not taken time to implement one. this translation system has also big limitations: aggregates and recusions (like wdt:P279*) are not working yet. The source code is here if you want to improve it. Tpt (talk) 11:00, 27 February 2018 (UTC)[reply]
- @Tpt: MW2SPARQL looked really promising -- more flexible and a lot more natural and intuitve that the API service. What went wrong? Jheald (talk) 22:13, 26 February 2018 (UTC)[reply]
- Its already possible to call MediaWiki API from the Wikidata Query Service (doc). There have also been a prototype of a SPARQL wrapper of the MW SQL database but it's now dead (doc). Tpt (talk) 21:39, 26 February 2018 (UTC)[reply]
- I already mentioned it several topics up: We should be able to extract all sorts of data from files (exif and other data) and store this in statements. Of course we'll have SPARQL. Without SPARQL this is a completely useless project to start with. Multichill (talk) 14:25, 24 February 2018 (UTC)[reply]
- @Jheald: We're still working on the What Goes Where aspect (which is why we're having this ontology chat with the Community). Having resolution continue in its current state is our initial plan, but it could change. Regardless of where it ends up, we do plan to have resolution to be searchable in some way, but I can't say for sure at this time whether that will be via SPARQL. RIsler (WMF) (talk) 22:22, 23 February 2018 (UTC)[reply]
Current Commons file description pages are CC-BY-SA. Wikidata is CC0.
What is the licence envisaged for the CommonsData information? Will this be compatible with moving existing free-text descriptions etc from wikitext? Jheald (talk) 21:38, 23 February 2018 (UTC)[reply]
- @Jheald: There will be a consultation around licensing, currently planned for sometime next month (March). It'll be a lot more useful to get into the details of license interaction and compatibility then, I think. Keegan (WMF) (talk) 22:02, 23 February 2018 (UTC)[reply]
- @Keegan (WMF): That's all very well, but it's quite an important consideration to think about, if we're talking about moving information from one space to another, if there's a chance it may not have a compatible licence. Jheald (talk) 22:05, 23 February 2018 (UTC)[reply]
- I agree with Keegan, it's important, but off-topic for this ontology discussion. Please use this as input once we start doing the whole legal/license/public domain discussion. Multichill (talk) 14:23, 24 February 2018 (UTC)[reply]
- @Keegan (WMF): That's all very well, but it's quite an important consideration to think about, if we're talking about moving information from one space to another, if there's a chance it may not have a compatible licence. Jheald (talk) 22:05, 23 February 2018 (UTC)[reply]
While thinking of the role for Wikitext and the role for Structured Data, it may be worth recalling that in the last few years there have been several hundreds of thousands of images from old books that have been uploaded here via Flickr, after being digitised and uploaded to there by projects like the Internet Archive, the British Library, the Biodiversity History Library etc.
A typical file might look like File:The_baronial_and_ecclesiastical_antiquities_of_Scotland_(1852)_(14595777267).jpg.
There is quite a lot of information here, that has been lifted directly from the Flickr page (links and all); but almost all is represented as straight wikitext, with very little parsing.
It was originally uploaded with only one substantive category (Category:Church architecture), which was in fact wrong. It didn't even originally have a category for the book (this was created by a human being shortly after upload), and the category for what it actually represents (Category:Castle Fraser) was only added this year, almost three years after upload.
This may well not be uncharacteristic of such images.
And yet they are useful to have, and the verbatim text from the Flickr page does attract hits from whole-text searching.
From this, a few things we should perhaps take note of: (i) we shouldn't necessarily be too starry-eyed about the amount of structured data we will have about images, not even newly-uploaded ones; and there may be limits as to what people may be prepared to extract (ii) while it's possible a little more structured data could have been auto-extracted from this image at upload, we probably shouldn't expect too much, and almost certainly shouldn't require too much, or we may not get such images at all; (iii) in such cases, the ability to hold lumps of unparsed but somewhat formatted text in wikitext, beyond the structured content, may continue to be quite useful. Jheald (talk) 18:16, 26 February 2018 (UTC)[reply]
- How much structured data is needed on upload will be something that we will need to develop a community consensus on (preferably once the system is in place so we have a sense of how hard upload is with the new tooling), but we should definitely support upload tools which encourage more rather than less structured data (because that will greatly improved discover-ability). Moreover, the roadmap is very clear: that mediawiki's string-based descriptions ability will not be removed from files, but live alongside the new structured data features. The particular image you point to, has tons of structured information though, and once the infrastructure is in place it will be easier for developers both in the community and outside of it to create pipelines that preserve that metadata as metadata rather than minimizing it's value (as the uploading tool from IA to Flickr seems to have done, and that lower quality was maintained in the upload to Commons). Sadads (talk) 16:06, 28 February 2018 (UTC)[reply]
- The above discussion is preserved as an archive. Please do not modify it. Subsequent comments should be made in a new section.