Commons talk:Structured data/Overview
October 2016 Consultation
[edit]Please share you thoughts about the project plan shared at Commons:Structured data/Overview. In particular, we are seeking feedback on the questions described in this section of the Overview. Please create subsections on this page. We look forward to your discussion. Astinson (WMF) (talk) 01:02, 26 October 2016 (UTC)
This is NOT a new project proposal, comments here should relate to the October 2016 update from the WMF on possibly expediting the project. — Preceding unsigned comment added by Seddon (WMF) (talk • contribs) 11:43, 26 October 2016 (UTC)
Questions
[edit]Why do we need your comment? How can you help? section had some questions. I copied them below:
Do you see this expedited roadmap as a worthy undertaking
[edit]- yes --Jarekt (talk) 02:10, 26 October 2016 (UTC)
- Yes. - PKM (talk) 22:55, 26 October 2016 (UTC)
- Yes --John Cummings (talk) 12:19, 27 October 2016 (UTC)
- No. Before enabling new stuff please fix the existing one. --Steinsplitter (talk) 11:23, 27 October 2016 (UTC)
- @Steinsplitter: Could you specify what you would like to have fixed? ChristianKl (talk) 08:53, 29 October 2016 (UTC)
- Just look at the open bugs at phabricator. --Steinsplitter (talk) 18:30, 31 October 2016 (UTC)
- @Steinsplitter: Could you specify what you would like to have fixed? ChristianKl (talk) 08:53, 29 October 2016 (UTC)
- Yes. --Denny (talk) 20:02, 27 October 2016 (UTC)
- Yes. Multichill (talk) 21:35, 27 October 2016 (UTC)
- Yes. ChristianKl (talk) 08:53, 29 October 2016 (UTC)
- Yes. --Micru (talk) 23:14, 29 October 2016 (UTC)
- Yes!! Susanna Ånäs (Susannaanas) (talk) 07:16, 31 October 2016 (UTC)
- The sooner the better. YES. Spinster (talk) 08:52, 31 October 2016 (UTC)
- Long overdue, imho. --El Grafo (talk) 13:21, 31 October 2016 (UTC)
- Yes. I am pleased to see a move to bring Commons into the 21st century. A file-repository never worked well with Wikipedia software, particularly the mess that is categories. Go for it! -- Colin (talk) 13:38, 1 November 2016 (UTC)
- Yes, I endorse the high level concept. I am not aware of anyone saying how much this would cost, who would take responsibility for management, what minimal promises can be made for any level of investment, or how this compares with other options. If there were a choice of options to support which each had a price tag then I am not sure which among the high-level projects I would prefer. I hope that no one interprets this poll as a community preference in favor of other options. A lot of other things might be done with Wikidata preferentially and at lower cost, and I am not sure who is making decisions or how. I can only guess that this project would be among the most expensive, complicated, and risky directions for development. The risk is going over budget and time for subpar delivery, and I would like to see some solid unqualified successes out of the WMF. Structured data in Commons is something that I dearly want, among other things that I dearly want. Blue Rasberry (talk) 16:43, 1 November 2016 (UTC)
- Yes, but Steinsplitter made a good point above. There are important issues which need to be fixed for a long time. On the top of my head: 1. interwikis with galleries and categories to the rest of Wikimedia, 2. using WD for authors (Creator pages, etc.). Yann (talk) 18:15, 1 November 2016 (UTC)
- Most certainly yes. It's really good to hear that efforts are now being made to look seriously at some of the fundamentals of the Commons software for the first time in a decade. MichaelMaggs (talk) 12:47, 8 November 2016 (UTC)
- Yes. Make it happen! --Beat Estermann (talk) 13:27, 10 November 2016 (UTC)
- No. Frankly, the effort seems confused and backwards. I look at http://structured-commons.wmflabs.org/wiki/File:LighthouseinDublin.jpg and see an instance of a photograph. I'd expect the photograph to have a Q-number and I'd expect the descriptors to hang off that Q-number. I'd expect it to be an instance of a photograph. I'd expect it to have a subject that is the Poolbeg Lighthouse. Instead there's a new type MediaInfo and stuff hangs off of it. For licensing, I expect there to be a generalized author for copyright purposes, but instead there is a photographer. So the photo doesn't have a photographer but the MediaInfo does; the property attaches to the wrong object; the photographer did not take a photo of the MediaInfo. There's no statement that the media is a photograph. There's MediaInfo that can be ripped from the JPEG metadata, but it is less accurate than the data in the JPEG (eg, date taken). The MediaInfo label is Poolbeg Lighthouse, but the object is not the lighthouse but rather a photograph of the lighthouse. It depicts a "lighthouse", but it should be depicting Poolbeg Lighthouse (Q7228600) (an instance of a lighthouse Q39715 which is a tower which is an architectural structure ... so dig deep to find a 3D object and know that panorama is an issue). The MediaInfo approach is a contorted view of a simpler problem. The basic world is simpler. Look at A Connecticut Yankee in King Arthur's Court Q848612. Copyright status falls out of that Q-number. It's an instance of P31 a book that was published P577 in 1889. The author P50 is Mark Twain who died P570 in 1910. I'll use country of origin P495 to infer publication was in US. That gives me PD-old-auto-1923|deathyear=1910. All done outside of the MediaInfo type. See also Moonrise, Hernandez, New Mexico (Q17107995) which has more complicated issues; individual prints of the photograph are significantly different. Far from expediting, I think somebody needs to pull the MediaInfo plug. Glrx (talk) 01:51, 21 November 2016 (UTC)
What roadblocks, risks or challenges do you anticipate with accelerating such a project
[edit]- I do not anticipate that you ever going to get rid of text version of some page descriptions. We had the same problem several times before:
- Early images did not require infobox, like {{Information}}, so for years people were adding images without them. We put a lot of effort into creating and maintaining core set of infoboxes and unifying hundreds of other rarely used infoboxes and description templates into them. We also put a lot of effort into adding infoboxes to files lacking them; however ~1% of files (in Category:Media missing infobox template) is still missing them. I expect they will be still missing years from now. The reason is that the remaining files mostly require manual processing an that is a very boring task that nobody is lining up to do. Also many of those files do not meet current standards of documentation and end up with mostly empty fields and some are randomly deleted for lack of source or author.
- We had better luck some years ago with enforcing the rule that all files require a template with a copyright tag. It was a massive job to add license templates to all the files that never had them or lost them.
- Another perpetually unfinished task is transfer of files from wikipedias to Commons. main issue is impossibility of automatically converting from one format of wikitext based description to another. So it has to be done automatically while dealing with missing data and frequent deletion of old images for lack of current day metadata. I expect that we will get 80-90% done but the remaining files will be with us for a long time. --Jarekt (talk) 02:10, 26 October 2016 (UTC)
- Another challenge we run into are stuborn users that do not like their files moved from wikipedia to commons, or do not like {{Information}} template and will engage in a war with anybody that adds standard infoboxes, or do not like any of the standard license templates and write their own text of a license. Same users might be creating succesful roadblocks at using wikibase type descriptions. --Jarekt (talk) 02:10, 26 October 2016 (UTC)
- perhaps the stubborn users object to their items which were stable on wikipedia for years, but are then transferred and deleted on commons. we might also talk about the stubborn admins on commons who do not play nice with anyone including wikidata. Slowking4 § Richard Arthur Norton's revenge 19:30, 31 October 2016 (UTC)
- @Slowking4: , I agree. Sometimes people transfer files from Wikipedia, relying on tools which are very bad at keeping all the image metadata. Than images are deleted from Commons due to insufficient metadata without notifying the photographer, only the user or bot that did the transfer. It is a maddening, but unfortunately not uncommon. I fully support users that that happen to to be "stubborn" about future moves by others. However the safest route is to move them yourselves. I also agree that being admin on Commons, or any other project, does not inoculate from being stubborn or difficult. I can show many examples. The best remedy is to nominate people who are not stubborn or difficult for the job. --Jarekt (talk) 16:50, 1 November 2016 (UTC)
- perhaps the stubborn users object to their items which were stable on wikipedia for years, but are then transferred and deleted on commons. we might also talk about the stubborn admins on commons who do not play nice with anyone including wikidata. Slowking4 § Richard Arthur Norton's revenge 19:30, 31 October 2016 (UTC)
- Final challenge I see is treatment of files that do not use {{Information}} template but one of other infoboxes or templates derived from them.--Jarekt (talk) 02:10, 26 October 2016 (UTC)
- Another challenge and opportunity would be capturing messy details of multiple licenses which apply to different jurisdictions and my be related to multiple co-authors. For example a photograph to the right should require information about sculptor and the photographer and list copyright tags for both. Many images require information about copyrights in the country of origin and in the US, and may also include information about copyrights in other countries. All that complexity is not well captured using current templates (See Commons:Multi-license copyright tags for info on copyright templates), but could be captured in well designed wikibase structure. I think we could automatically migrate many of the current files to such system, but probably not all. I have however high hopes on capturing such complexity with the uploads we contribute years from now. Finally I hope that being able to capture more details will not add to higher confusion during upload and will not lead to mass purges of old non-complying files that meet community standards at the time of the upload, but might not be meeting future standards. --Jarekt (talk) 13:07, 26 October 2016 (UTC)
- @Jarekt: This is all really great feedback! Having realistic expectations about the speed of adopting structure, is really important. We are trying to calibrate the proposal not to overpromise the conversion, but are cautiously optimistic. Of course, working to build a reasonable community process for prioritizing and supporting that transition will be really important. Astinson (WMF) (talk) 16:01, 1 November 2016 (UTC)
- Apart from all the challenges regarding migrating existing files, what about uploading new ones into the new system? Many experienced users prefer alternative upload methods (see Commons:Upload tools) over the default UploadWizard. Please make sure you don't underestimate the disaster breaking those tools would cause! Forcing people to use the UploadWizard will result in angry mobs with torches and pitchforks. I think it's crucial reach out to the developers of those upload tools as early as possible to give them ample time to adopt. --El Grafo (talk) 14:09, 31 October 2016 (UTC)
- @El Grafo: We agree with you completely: One of the Year 2 priorities, is going to be working with communities on uploading and other important tools, especially those which are already designed to work with structured data in some way (for example, if you haven't engaged with Commons:Pattypan yet, I would recommend giving it a try (it makes the infoboxes much easier)). Wikidata was hugely successful because of the community of volunteer developers, the plan is to apply that learning to this project. Astinson (WMF) (talk) 16:01, 1 November 2016 (UTC)
- @Astinson (WMF): Thanks, that's great to hear! --El Grafo (talk) 14:30, 8 November 2016 (UTC)
- @El Grafo: We agree with you completely: One of the Year 2 priorities, is going to be working with communities on uploading and other important tools, especially those which are already designed to work with structured data in some way (for example, if you haven't engaged with Commons:Pattypan yet, I would recommend giving it a try (it makes the infoboxes much easier)). Wikidata was hugely successful because of the community of volunteer developers, the plan is to apply that learning to this project. Astinson (WMF) (talk) 16:01, 1 November 2016 (UTC)
- Volunteer tools were such a big success for Wikidata, because we could use Wikipedia as data source. The biggest data source were Wikipedia categories and Wikipedia templates. I guess the most important data source for Commons will be the category system in commons. Other possible data sources will be the name of the files and the description of the images. A possible new tool could be an image recognition software for bots like pywikibot to detect if the picture is about people, animals, houses, etc. There is Commons:Bots/Work requests, but it is not very active. --Molarus (talk) 06:59, 4 November 2016 (UTC)
- not a chance of upload wizard only. the challenge is how do we guide new uploaders to appropriate tools, since they are open source, (changing support) and harder to find. by default upload wizard with small off-ramp, you get a lot of information template to change to artwork. and questions at village pump. need a dashboard for uploaders to select right one; need a tool life-cycle, with on-boarding at WMF for the good ones. Slowking4 § Richard Arthur Norton's revenge 19:40, 31 October 2016 (UTC) — Preceding unsigned comment added by Astinson (WMF) (talk • contribs) 16:19, 01 November 2016 (UTC)
- Handling the social/community aspects will be critical. Technical changes - especially after such a long period of technical stagnation on Commons - will inevitably annoy some people, and we know that long intemperate discussions on wiki significantly put off volunteers who might given a more positive environment be very happy to help out. That's one reason why as Jarekt mentions above it can be difficult to crowdsource volunteers for new cleanup tasks. In parallel with discussions of technical changes we need to make sure our Commons policies, guidelines and legal rules are up to the task. Commons currently has relatively few formal policies/guidelines, and one consequence is that for many issues the community has not yet worked out what its 'official' consensus view should be. Where rules need to be added or changed early community discussion is going to be really important, with a strong emphasis on encouraging cross-wiki and cross-community collaboration. MichaelMaggs (talk) 13:35, 8 November 2016 (UTC)
- +1, I'd go so far as to say that this might very well be the most difficult part of the whole project. --El Grafo (talk) 14:30, 8 November 2016 (UTC)
- That's why we should start by small steps: automatic handling of institutions and creators seem to be a good idea. Once that work, we can shift to bigger issues. Regards, Yann (talk) 17:41, 8 November 2016 (UTC)
- Poor design implies disaster. Glrx (talk) 01:55, 21 November 2016 (UTC)
- Having read the dialogue above between user:Steinsplitter and user:Lydia Pintscher (WMDE) above (see [1] and [2], I cannot say that my level of trust in this project is very high. In my opinion, the "existent stuff" on file description pages is very important because it shows exactly how the file was introduced into Commons. It is vital that this information is not only stored forever, in its original form (usually a natural language), but also easily accessible. The complete history must be preserved for good and must remain accessible to any reader (and uploader/writer). Otherwise, a lot of confusion is certain to follow. To improve the search for Commons objects is a worthy task but this must not be an excuse to "replace" the existent descriptions. I admit that I have a great fear of monopolization here. If databases are interlinked and "networked", the danger is that independent data sources are deleted and each data base only shares information from other data bases but the origin of the data is obscured.--Mautpreller (talk) 14:15, 11 January 2017 (UTC)
- @Mautpreller: the engineers are actually working on a set of backend features that should help in this context: the main one will be Multi-Content Revisions which will allow multiple types of content change history to be merged into one log, to allow for folks to find a greater amount of information on historical changes for a page. This is only part of the problem solving though: we also have to make sure that the history pages for the commons media files have sufficiently clear and usable interfaces for working with the contribution history.Astinson (WMF) (talk) 17:26, 2 February 2017 (UTC)
- "Making (semi-)automatic editing easier" - I am afraid this means more automatic editing by bots and mass-editing, especially concerning deletions. From the point of view of someone mainly contributing to Wikipedia rather than Commons, altogether the current bots and scripts on Commons do more harm than good. The deletions of files that I stumbled upon because they were used in articles were almost all wrong. For example I know of
- two instances, where a sketch/diagram was deleted and one where it was almost deleted because someone tagged it as being "wrong", just because the automatic categorisation was wrong, suggesting that it showed something different. A complete description was only found in the Wikipedia articles where the sketches/diagrams were depicted and referred to in the text for many years. Without prior notice, without someone actually reading the articles or providing an alternative picture, they get silently erased by CommonsDelinker. If that was not enough, the CommonsDelinker in one case also stated the wrong reason "missing license" for the deletion. People started complaining on the discussion page half a year later because they didn't understand the text that said things like "see figure 1" but it took many years for someone to figure out the reason and repair it.
- another example is an editor trying to improve articles by uploading improved versions of images. Because his new creations are based on the old ones, he states that they are not entirely his own work. The old pictures in the articles are replaced with the new ones and the new ones automatically labeled as missing license or permission and then deleted. No one bothers to look at the discussion page of the uploader if he explains what he has done or looks at the articles where the images are used to find out their origin. Often the article can be reverted to use the old images in case there is a valid reason for deleting the new ones. A similar case happened to me. I just used different parameters in a Matlab code that someone put on Commons as public domain to create an animation. I thought it would be sufficient to link to the original file as a source, but unfortunately the bot couldn't read that.
- last but not least the cases already mentioned above, where a file was transfered to Commons from Wikipedia and then deleted due to missing information that was lost during the process or some automatic-cleanup afterwards.
- Given the current handling of deletions and the way in which statements are added to Wikidata at present, just because some algorithm suggests something, these problems are likely to get worse. For example I corrected quite a few items with wrong coordinates added by some scripts, where one error often results in a whole bunch of wrong statements. In case this information is used to hunt for pictures to nominate for deletion (e.g. because "freedom of panorama" does not apply for what the picture is supposed to show and where it appears to be taken), the project will cause a lot of damage and anger among the Wikipedia editors.--Debenben (talk) 16:25, 22 January 2017 (UTC)
Does the current project accurately represent the role of the communities, especially the Wikimedia Commons and Wikidata communities, in engaging with such a software project?
[edit]- That is a hard question. On commons many of the most active users find their niches and work there, for example many hardworking admins work on keeping up with daily flood of copyrighted images that need to be deleted. Often current self appointed tasks do not allow easy switch to new tasks, so it is much harder to crowdsource a new set of tasks. For example when we finally identified all the files without a infobox by adding them to Category:Media missing infobox template there was not army of volunteers to fix them. Similarly when Wiki Loves Earth competition dumped over a month few thousands images with bad coordinates it was hard to find volunteers to fix them. On the other hand Wikidata grew a large community of volunteers to run the site doing tasks nobody thought of years ago. I do not expect them to drop their current work and come to Commons and work on our structured data, but I do hope we can grow our own community of volunteers to work on our new tasks. Those could be users that do not like the ways we do image description or categories currently on Commons. --Jarekt (talk) 13:22, 26 October 2016 (UTC)
- No. I have attempted to identify who the "we" is by reading the overview, but only managed to work out that it was not WMDE, the WMF or Wikimedia Commons volunteers (though it could be that "we" is the WMF and WMDE, just talking about themselves like third parties). I can deduce two names of WMF employees based on the posts being made on-wiki, but it would probably be wrong to presume that's the whole team. In terms of engaging, that's an interesting point, as the example image groups being targeted in "How Commons Content Could Change" includes a lot of uploads from my projects, but nobody has approached me with practical analysis on how my uploads might need to be adapted, for example where I have applied project specific ingestion templates to hundreds of thousands of images, such as {{nypl}}. My assumption would be that engagement will remain passive, consisting of posted invites suggesting volunteers to comment now plans and proposals have been published. Experience shows that volunteers that invest their unpaid time creating suggestions for changes and improvements will be resisted or politely sublimed by the late stages of proposals like this. In summary, I might be interested in helping the transition to structured data, if it was explained in a way I understood rather than way-laid with jargon that does not seem pinned on measurable definitions, and the plan (or timeline) showed there was anything that I could look at from a pragmatic Commons perspective before 2019. From what I've read, I think this is all about Wikidata until then. --Fæ (talk) 11:17, 27 October 2016 (UTC)
- @Fae: The "we" in the document is the WMF and WMDE group scoping this work which includes staff across both organization's engineering units; Seddon and I are providing community engagement support, because the project could include external funding (in the scope of WMF Major Gifts) and is closely related to movement work in the GLAM-Wiki space (in the scope of WMF Programs team).
- Additionally: thanks for the feedback on needing a more pragmatic timeline: at this time, we are offering a high level, because whoever is leading on this project (someone like User:Lydia Pintscher (WMDE) is for Wikidata), will be working directly with the community for both prioritizing features to implement and the order/way in which community activity might change (making sure that available infrastructure changes alongside community consensus). The current focus is on investing in the infrastructure, how the infrastructure will be used by the community will be dependent a lot on those further conversations and is a major component of the timeline -- we don't want to prescriptively propose a means of implementation before the demo is ready for demonstration, and the Commons community has had time to evaluate it and provide feedback. Astinson (WMF) (talk) 13:58, 31 October 2016 (UTC)
- No. I never seen a community consensus pursuant to COM:RFC. --Steinsplitter (talk) 11:24, 27 October 2016 (UTC)
- No. I don't see clear goals or clamor for particular features. Put the data in a database rather than in text is a solution, but what is the problem that is being solved? It cannot be just to move data around. The project does not state goals but rather "benefits". Those benefits are motherhood-and-apple-pie statements rather than concrete goals. "One way to think of structured data: It’s a kind of DNA that explains information in a much more integral way." What does that mean or solve? DNA is instructions for assembling proteins; it tells us nothing about what those proteins do. Bad metaphor. It's a sales pitch in the aether. Lots of author information on Commons is dead wrong; moving it to a database isn't going to fix that. Today I corrected the author for File:Rhamnus frangula - Köhler–s Medizinal-Pflanzen-120.jpg. All of the credits in the Köhler's Medizinal-Pflanzen are wrong: Koehler wrote the book, but others did its artwork. If we knew that Koehler is not an illustrator, then we could deduce that he is not the author of a drawing. We can use a database for checking. But I do not see that as a stated goal. What are the goals? How will those goals be achieved? Glrx (talk) 02:14, 21 November 2016 (UTC)
How would you like to support this project?
[edit]- The GLAMpipe metadata transformation and upload tool would be an ideal tool to support working with this framework. I will commit to developing it.
- I am interested in working to streamline licensing procedures, and the way how media, licenses and metadata are represented in the MediaViewer, file page or shared content.
- I am also interested in contributing to "templates", metadata subsets that are required for specific types of media. I have worked with the Map template, discussing broadly with media providers and reusers, and I can specifically contribute to that.
- I would like to see participatory methods of developing these, meaning I as well as many others would like to be engaged in discussion and design. However, as pointed out elsewhere, this is a long anticipated development, and it should not be set on a track that could get it jammed. Therefore, I opt for inclusive, forward-thinking and productive setups. The priority is very high, and should be recognized.
- When in place, I would like to contribute to developing ways to enrich data: Adding location, connecting to additional data, making annotations, recognizing features - in micro tasking applications like the Wikidata Game and in regular MediaWiki tools
--Susanna Ånäs (Susannaanas) (talk) 07:38, 31 October 2016 (UTC) (edited 14:34, 31 October 2016 (UTC))
Categories, tags, and navigation within Commons
[edit]Structured data is a huge opportunity for Commons that I've been waiting for years. I would really like this opportunity to be used to rethink what is the purpose served by Commons categories, and if there are still needed with structured data (I think they aren't). In my mind, categories have this functions :
- They give information. For instance, I know that everything in Category:Paintings by Vincent van Gogh are paintings, made by Vincent van Gogh. This function will always be best served by structured data, wikidata:Property:P31 : wikidata:Q3305213 and wikidata:Property:P170 : wikidata:Q5582. Structured data is better because it is multilingual and more precise at the same time.
- They connect (subparts of) Commons to other Wikimedia projects (and share this role with pages/galleries). This connection is both for readers (if you are interesting in this article, maybe look at our collections of images about this topic) and editors (improving a wikidata item about someone ? We might have a picture of their tomb). Structured data would help this by having more meaningful results (compare Category:Paintings by Vincent van Gogh, which has only subcategories and a handful of low quality files, to the appropriate SPARQL request. Instead of entry points based on manual curation, which can be explicit (pages such as Vincent van Gogh) or implicit (by adding "Category:Vincent van Gogh" in a file), we could have dynamic entry points, defined as SPARQL requests. (They could be updated by bots every day, or dynamically generated for each reader, or any solution that is cost effective).
- They creates path of navigation within Commons. This is their most overlooked job, and it feels like we, the Wikimedia movement, forget that people might just want to look at pictures of a place without reading an encyclopedic article about it or looking for travel advices. Categories, with their rigid inclusion semantic, don't help. Sure, if I want to see only portrait paintings by Vincent van Gogh, there is Category:Portrait paintings by Vincent van Gogh. But if I want to see paintings of flowers ? Or sunsets ? Or any combination of criteria that is not yet here (and we already have LOT of multicriteria categories such as Category:Portrait paintings by Vincent van Gogh, Saint-Rémy 1889 and yet we are very far from covering everything). With structured data, we can allow the reader to choose their criteria (paintings only in a given geographic area, or about a given topic), but also open doors of serendipity (see paintings of flowers from other artists).
Structured data is going to turn Commons into the wonder it deserve to be : let make sure we give it the full power to amaze us ! Léna (talk) 11:00, 26 October 2016 (UTC)
- I agree that revamping the category system might be one of the great benefits of structured data. See also my slide with examples of other issue with categories on Commons. --Jarekt (talk) 12:42, 26 October 2016 (UTC)
- In theory it's a good idea, but in practice it would depend how well it was implemented, and what kind of user interface could be created. Categories do at least work, they are fast to browse and fast to update. That SPARQL request locked up my browser for a few minutes, and if I wanted to modifying the query I'd have to spend time understanding the query language and Wikidata properties. I suspect the results would also be limited to one image per artwork, there'd be no way to display all matching files in Commons. --ghouston (talk) 00:21, 27 October 2016 (UTC)
- The Wikidata notability requirements would also need to be examined. At present, it seems that a Commons category alone isn't sufficient to allow creation of a Wikidata item, and if Commons categories go away, even that wouldn't be available. What happens when you want to group images by a concept that isn't described on any other Wikimedia site, and perhaps doesn't even have reliable external references? --ghouston (talk) 00:26, 27 October 2016 (UTC)
- Do you have an exemple of such a concept ? Léna (talk) 06:27, 27 October 2016 (UTC)
- @Léna I have tons of such examples. In fact it is part of my workflow as a volunteer interested in 17th-century art. If I can't find the artist (or museum, or genre, or subject) then I create a category for it. Days, months or years later I might go and write an article about the person, thing, concept or whatever, and then I get around to updating the various Wikidata items involved. Sometimes I don't get farther than just Wikidata items and never bother with a Wikipedia article (such as grouping artworks in categories by collector - the collector may have an article and some of the artworks may have articles or items, but I never bother to create items or articles for the collection). Jane023 (talk) 08:53, 27 October 2016 (UTC)
- I have the same kind of workflow but I usually create "correct" items and "bad" categories. For instance wikidata:Q27553312 has clear, structured, multilingual information while Category:Mission Gabriel Maget is really poor (I only created it to link it to the item). I find it more easy to express information through statements than by finding the right parents categories of the one I just created. Léna (talk) 09:06, 27 October 2016 (UTC)
- Exactly - and the point is there is nothing wrong with such workflows. It is perfectly OK for someone to create detailed commons categories and not bother with Wikidata. The point is that on Wikidata we have loose definition of notability along the lines of "if it is linked to a notable item directly, then it's OK" and on Commons it's not so clear. Importing Commons categories of artists'artworks for existing items for artists is OK, but importing commons categories of artists when there is no associated item for the artist is probably not OK. Jane023 (talk) 11:27, 27 October 2016 (UTC)
- I think that if you have a Commons category of a person that you can find in VIAF, LOC or other library catalog (find enough info to fill {{Authority control}} template) than it is notable enough for Wikidata. Article or no article. I think their criteria for notability is much lower than for other projects. --Jarekt (talk) 12:41, 27 October 2016 (UTC)
- Examples I can think of: people who don't have Wikipedia articles, perhaps sports people or academics, where it seems worth keeping a photo of them in Commons in case they are needed some day. Random devices such as obscure models of mobile phones where there's no Wikipedia article. I'm not sure if that item mentioned above, wikidata:Q27553312, meets Wikidata:Notability, due to the clause "an item with only a sitelink to a category page in Wikimedia Commons is not allowed on main article items". --ghouston (talk) 23:57, 27 October 2016 (UTC)
- I think that if you have a Commons category of a person that you can find in VIAF, LOC or other library catalog (find enough info to fill {{Authority control}} template) than it is notable enough for Wikidata. Article or no article. I think their criteria for notability is much lower than for other projects. --Jarekt (talk) 12:41, 27 October 2016 (UTC)
- Exactly - and the point is there is nothing wrong with such workflows. It is perfectly OK for someone to create detailed commons categories and not bother with Wikidata. The point is that on Wikidata we have loose definition of notability along the lines of "if it is linked to a notable item directly, then it's OK" and on Commons it's not so clear. Importing Commons categories of artists'artworks for existing items for artists is OK, but importing commons categories of artists when there is no associated item for the artist is probably not OK. Jane023 (talk) 11:27, 27 October 2016 (UTC)
- I have the same kind of workflow but I usually create "correct" items and "bad" categories. For instance wikidata:Q27553312 has clear, structured, multilingual information while Category:Mission Gabriel Maget is really poor (I only created it to link it to the item). I find it more easy to express information through statements than by finding the right parents categories of the one I just created. Léna (talk) 09:06, 27 October 2016 (UTC)
- @Léna I have tons of such examples. In fact it is part of my workflow as a volunteer interested in 17th-century art. If I can't find the artist (or museum, or genre, or subject) then I create a category for it. Days, months or years later I might go and write an article about the person, thing, concept or whatever, and then I get around to updating the various Wikidata items involved. Sometimes I don't get farther than just Wikidata items and never bother with a Wikipedia article (such as grouping artworks in categories by collector - the collector may have an article and some of the artworks may have articles or items, but I never bother to create items or articles for the collection). Jane023 (talk) 08:53, 27 October 2016 (UTC)
- Do you have an exemple of such a concept ? Léna (talk) 06:27, 27 October 2016 (UTC)
- The Wikidata notability requirements would also need to be examined. At present, it seems that a Commons category alone isn't sufficient to allow creation of a Wikidata item, and if Commons categories go away, even that wouldn't be available. What happens when you want to group images by a concept that isn't described on any other Wikimedia site, and perhaps doesn't even have reliable external references? --ghouston (talk) 00:26, 27 October 2016 (UTC)
- I've been reading these paragraphs and I understand that it would be a good method to substitute categories and improve cataloging of files. That implies keeping many of the categorizations by creating Qs.
Let's take Category:Carrer Pasqual Arbós 5, Xirivella. It's a building, not a monument, lacks any relevance but its very existance, we pictured it because not many buildings of that sort have reached our days in Xirivella. I cannot reference it in any other form than saying "go there and look". A Q for it will be needed or the information would be lost (or very difficult to find).
I can think of odder things: Category:Water supply manhole covers in Sueca. We have found that manhole covers are a source of information and we usually photograph them. Using several properties (it's a manhole cover, it's in Sueca, it's related to water supply) can help, but SPARQL (quoting Asaf Bartov) is very difficult to use, so a better more user-friendly questioning interface is requiered. B25es (talk) 18:32, 27 October 2016 (UTC)
- I assume that our categories would not go away, but would remain as a parallel way of keeping track of things. In the old days we organized files using galleries which were competing with categories. Categories won, but we still have thousands of out of date galleries nobody maintains. I think we can do the same with new system. As for SPARQL I assume that tools will be written to see all the images that meet some criteria without using SPARQL queries. For example (following my image in the sections above), if you pick tags: paintings, male subject, from France, and portraits you will get something similar to the content of Category:Portrait paintings of men of France. --Jarekt (talk) 18:59, 27 October 2016 (UTC)
- In that case would it be up to the user to think of appropriate tags to restrict their search by? They'd also need some way to find out what relevant tags are available. A tag like "clock" could include a vast range of devices including single-function clocks and all kinds of multi-function devices that happen to include a clock, including practically every computing device. --ghouston (talk) 00:10, 28 October 2016 (UTC)
- Exactly what I'm thinking, we would need a system of both suggestions and free navigation (not threw SPARQL requests, but for something way more reader-friendly). For instance, once you are in Category:Paintings by Vincent van Gogh, you would have a way to restrict the search (with suggestions such as "in a given museum (van Gogh museum, Orsay, other) / at a given period (on a "ruler" from 1878 to 1890) / about some topics (portraits, landscapes, still life, etc) / with given properties (copies of Millet's works)) or to extend the search (drawings by Vincent van Gogh, or paintings by other painters). Thus, the navigation would be defined "top down" : the "code" of Category:Paintings by Vincent van Gogh would swich from the bottom up
- [[Category:Vincent van Gogh| Paintings]] [[Category:Paintings by painter|Gogh, Vincent van]] [[Category:19th-century paintings from the Netherlands|Gogh, Vincent Van]] [[Category:Paintings from the Netherlands by painter|Gogh, Vincent Van]] [[Category:Post-Impressionist paintings|Van Gogh]] to something like
- Down
- museums : list - van Gogh museum, Orsay, others
- period : ruler - 1878-1890
- topics : list - portraits, landscapes, still life, others
- filter : inspired by Millet
- Up
- Works by Vincent van Gogh
- Linked
- Paintings by other artist : list - Anthon van Rappard, Émile Bernard , Paul Gauguin
- Down
- So it would be required of the sofware to have a langage that expresses these kinds of links and that this language be easily used by the visual editor. The role of the Commons editor would thus to express which parts of Commons should be linked with one another. Léna (talk) 09:59, 28 October 2016 (UTC)
- Yes something like that, although I don't understand how "subcategories" would work in the new scheme. Somehow these would need to be derived from the Wikidata relationships. Then if a file in Commons was tagged with "Samsung SGH-D600", for example, would it also be found in a search for mobile phones, or for battery powered devices, or would those tags need to be added to the file explicitly? Locations are also difficult, when selecting London you'd want to include everything with geographic coordinates within its borders, as well as anything tagged with a geographical subregion such as Westminster. --ghouston (talk) 23:14, 29 October 2016 (UTC)
- In that case would it be up to the user to think of appropriate tags to restrict their search by? They'd also need some way to find out what relevant tags are available. A tag like "clock" could include a vast range of devices including single-function clocks and all kinds of multi-function devices that happen to include a clock, including practically every computing device. --ghouston (talk) 00:10, 28 October 2016 (UTC)
Maybe a Commons Category to SPARQL-query translator is needed. I mean, there are hundred thousands categories and it is not possible to write hundred thousands SPARQL-queries by hand. We would need a way to store those queries and maybe a way to cache the results. Querying 34 mio items is expensive. Maybe we would also need a software that proposes Commons Wikidata statements. The software knew what SPARQL-queries already exist and therefore could say, that your media file is similar to those media files and therefore have similar statements. For example, the software could create a list of museums that have paintings of Vincent van Gogh and the user could choose from that list, instead of searching for the right Qnumber of the museum. Second point: That Vincent van Gogh query shows about 1000 pictures. No one wants 1000 pictures to look into. We would need an assistant that asks you questions to reduce the number of pictures. That is another reason why categories are created. Third point: Without categories, where should Wikipedia articles link to? Maybe such a software decides if this project fails or succeeds and maybe the developers should start with that software, not with building Commons Wikidata. --Molarus (talk) 00:04, 28 October 2016 (UTC)
- I don't know for Commons Category to SPARQL-query translation, but I'm working on a translation from Categories to statements : Commons2Data. You can have limitations on queries (for instance, display only the 50 first results). And your last point is one of my points : we need entry points from Wikimedia to Commons (and, btw, both categories and galleries are not that good entry points). Léna (talk) 09:37, 28 October 2016 (UTC)
- 1) Maybe a Commons Category to SPARQL-query translator could use a en:Genetic algorithm. The right SPARQL-query is found, if the query returns more or less the same pictures as are in the category. 2) I don´t think just showing the 1000 pictures step by step is the right answer, I would rather see your filter proposal as a better solution. But those filters have to be created by software from the statements of those 1000 van Gogh pictures. --Molarus (talk) 18:20, 28 October 2016 (UTC)
Exemple
[edit]An illustration of what I would like to achieve : Category:1884 paintings by Vincent van Gogh, Nuenen. It's currently done manually (well, I wrote a script). What I would like to achieve is something like this :
- Description :
- instance of (P31) : painting (Q3305213)
- creator (P170) : Vincent van Gogh (Q5582)
- inception (P571) : 1884 (or an ISO representation of dates)
- location of creation (P1071) : Nuenen (Q153516)
- Split :
- Label :
- English: 1884 paintings by Vincent van Gogh, Nuenen
- Français : Peintures de Vincent van Gogh réalisées en 1884 à Nuenen
So, let us break this. The description is the machine-readable translation of the name of the category. This description is precise, with no ambiguity, and multilingual by nature, as long as properties and items are translated. This description is a bit, well, machine-like to read, so we add the possibility to overwrite this with natural langage labels. So with that we "just" solve the problem of multilinguism in Commons. But ! The most interesting for me is the "Split" part, i.e. what concerns navigation. Just by saying to split according to genre (P136), the subcategories can be generated automatically : portraits, landscapes, still life, and genre. Along with that, other way to navigate should be possible : changing one of the property for instance. This allows for more fluild navigation than what is currently possible in Commons. Léna (talk) 19:08, 5 February 2017 (UTC)
Experimental "category contains" template
[edit]@Léna, Jarekt, Ghouston, and Jane023: @Astinson (WMF), B25es, and Molarus:
I don't know whether the following is useful or not. I suggested the idea a couple of weeks ago at Commons Village Pump, and it disappeared without comment.
Anyway, extending a bit what Léna was already talking about above, namely trying to turn Commons categories into statements (ideally as near automatically as possible), it occurs to me that it would be useful to be able to store and share on-wiki progress that people make on that. It's difficult to store that on Wikidata, because most Commons categories (particularly intersection categories) don't have Wikidata items. But we can store the information in a template on the category. As a bonus, if we translate the description into a fragment of SPARQL, then we can offer the user a standard query to see what matches it on Wikidata.
So I made a prototype, Template:Category contains, generating a strap-line that can be put at the top of the category page; or just before the items. As a default demonstration, with no arguments, it gives this:
where clicking the query link runs a query for "cat", its subclasses, and instances thereof, generated by the following description:
wdt:P31?/wdt:P279* wd:Q146
As an experiment, I have tried adding it to the categories mentioned in the discussion above:
- Category:Paintings_by_Vincent_van_Gogh
-
wdt:P170 wd:Q5582; wdt:P31/wdt:P279* wd:Q3305213
- Category:Portrait paintings by Vincent van Gogh
-
wdt:P170 wd:Q5582; wdt:P136/wdt:P279* wd:Q134307; wdt:P31/wdt:P279* wd:Q3305213
- Category:Portrait paintings by Vincent van Gogh, Saint-Rémy 1889
-
1=wdt:P170 wd:Q5582; wdt:P136/wdt:P279* wd:Q134307; wdt:P1071 wd:Q221507; wdt:P31/wdt:P279* wd:Q3305213; wdt:P571 ?inception FILTER(year(?inception) = 1889)
- Category:1884 paintings by Vincent van Gogh, Nuenen
-
1=wdt:P170 wd:Q5582; wdt:P1071 wd:Q153516; wdt:P31/wdt:P279* wd:Q3305213; wdt:P571 ?inception FILTER(year(?inception) = 1884)
and also four other test categories:
- Category:Grade I listed buildings in Bedfordshire
-
wdt:P131+ wd:Q23143 ; wdt:P1435 wd:Q15700818 ; wdt:P31?/wdt:P279* wd:Q41176
- Category:Grade I listed churches in Bedfordshire
-
wdt:P131+ wd:Q23143 ; wdt:P1435 wd:Q15700818; wdt:P31?/wdt:P279* wd:Q16970
- Category:Grade I listed houses in Bedfordshire
-
wdt:P131+ wd:Q23143 ; wdt:P1435 wd:Q15700818; wdt:P31?/wdt:P279* wd:Q3947
- Category:Grade I listed bridges in Bedfordshire
-
wdt:P131+ wd:Q23143 ; wdt:P1435 wd:Q15700818; wdt:P31?/wdt:P279* wd:Q12280
So would these {{Category contains}} templates be worth trying to roll out on a wider scale?
Yes, the SPARQL looks a bit off-putting at first sight; but it's quite systematic, made of formulaic pieces - so creation would lend itself very readily to mechanisation; one could also imagine a drop-down tool to help, able to handle the most common cases. Ultimately I would see the information probably sitting most naturally on an item for each category in the Commons wikibase -- similar to eg the way the property P360 "list of" is presently deployed like this on the item Q5591762 "Grade I listed buildings in Bedfordshire", which Reasonator currently uses to automatically show a list of matching items. Of course as yet there is no Commons wikibase, but where we can store the description is in a template.
Yes too, there are things that can be improved with the query. It would be nice to be able to switch more easily to the map and the picture-grid views; and it would be nice if the links on the files were clickable to file pages, and the categories to commons category pages. (I've put in a ticket). And a generic "start" and "end" date column, covering a multitude of uses, would be useful too. But this can be developed -- by making the query a template, it can go on being developed, independently of the descriptions getting added to categories.
But I hope some of the information it's already able to provide may be useful to be able to compare with the category, so people would tolerate it if it started to appear more widely.
What other thoughts?
- I don't think categories are going to go away.
Yes the tag (or topic) based searching Léna described above, that one can already see on eg Art UK and any number of other sites, is going to be very seductive - it should be a huge improvement on what we have at the moment. It should make it possible to narrow down to an image tag by tag, or to widen from one by showing a list of properties it fulfills and allowing them to be turned off or generalised; presumably all with a switch to show only the best image representing any thing (its P18) or all images. But there are a number of things going for categories too -- there is an awful lot of context-specific knowledge in the category tree, as to what often makes most sense to sharpen or generalise from a particular set; also it's no small thing, as Jane pointed out, the ability to throw together an ad-hoc category with minimum fuss. Plus we have all the visual tools, like cat-a-lot, that already work. And a lot of categories have had work done -- introductions, header notes etc -- that can make them a bit more of a curated presentation than just the images they contain.
- One of the big challenges is going to limitations in the completeness of Wikidata.
Considering Van Gogh as a test-case may be a bit misleading, because so much work has already been done to try to improve and achieve more completeness of Wikidata statements in relation to him. The "cat" or "G1 churches in Buckinghamshire" examples above may be more representative, where the SPARQL searches currently return significantly less than what the Commons categories we have represent -- eg so many fewer classes of cat (or even individual cats); and many churches missing, not because they aren't in Wikidata, but because they ought to have a statement saying which parish they are in; and parishes ought to have a statement saying they are in Buckinghamshire (or a part of it).
- SPARQL will find it difficult to replicate some category views.
-- because category views not only show what is in the category, they also exclude what is in subcategories. This can be quite expensive. Indeed, I wonder if SPARQL will already have trouble with some intersections (even without the subcategory problem), if it is combining some quite big sets. It can already struggle to evaluate the whole subtree of P131 "located in the administrative territory"; and there are other sorts of conditions that also produce very big solution sets -- generating them, and then intersecting them, may take a lot of time. And at the moment SPARQL queries are only run by a handful of enthusiasts - a number of queries that is orders of magnitude smaller than the number that might be expected for the main search system for a mass-exposure website. So I expect some (maybe most) views may have to be pre-computed -- which starts to become not so different to static categories.
On the other hand, one thing that SPARQL might really help could be population of categories -- simply asking in turn, level by level, whether the image could be percolated down to any subcategories, based on the information in Wikidata about what the item the picture depicts, plus further image-based considerations.
- It's going to involve a lot of machine analysis
-- because the data simply isn't there at the moment. It's all very well for Magnus to say that 85% of images may be using standard templates; but the flip-side of that is that those templates contain rather little useful machine-interpretable information. Does the image represent a photograph or a painting or a map or a drawing or a scanned text page from a book? Is the main subject a portrait, a building, a landscape, a sculpture, an urban scene, a piece of furniture? Is it colour, black-and-white, sepia, daylight, night-time? Does it contain key recognisable objects? Does the category suggest what it might be? Art UK tried crowdsourced tagging, and got a lot of tags; but most images don't have one. They've been at it for five years, and they have far fewer images than we have. So I think there's going to have to be a huge amount of directed machine analysis -- of the images, of text descriptions, of what categories mean, and all the rest -- to really nail this. If we're going a wikibase route, one extra thing that might help, might be an extra rank level of "machine inferred -- not verified", above 'deprecated' but below 'normal' -- with rapid tools (eg visual tools, or simple tick/cross buttons) to help humans confirm or reject the machine suggestion, at scale.
I think we underestimate just how big a job it's going to be. But I offer {{Category contains}} as a modest beginning, to try to start to record some understanding of the 80% of Commons categories that at the moment are not identified to any Wikidata item. Jheald (talk) 14:06, 23 February 2017 (UTC)
Discussion
[edit]- I can´t say if such a Wikidata list is of any value, I´m not an experienced commons editor. At least, there are some things I would change: 1) Show more languages (not only "en"), which is better then an empty space. 2) Is it possible to transform the commonscat-text into a commonscat-link? --Molarus (talk) 15:14, 23 February 2017 (UTC)
- @Molarus: You're right, I should include more language fallbacks. (And the template itself, as well as the query, should ultimately be internationalised. But this was a first stab. As to commonscat, what I was hoping was this might get fixed in WDQS, for them to hold a URL as well as a string for identifiers. Which there's certainly a long-standing ticket for, to make the identifiers work as linked data. So I hope that will fix it eventually. For the moment, it's the choice between a string and a URL, but you're right, until identifier links get sorted out, we should probably go for the URL. Jheald (talk) 15:49, 23 February 2017 (UTC)
- I have seen, that you have experience with javascript. At the moment, I´m inserting Wikipedia sorting into a lua module (see here, the arrows in the table header are done by inserting class="sortable" into the code. I will add more code, for example, for sorting date right). Maybe we could have some javascript similar to this sorting code for commons too? My experience with javascript is limited, therefore I can´t say what is possible. --Molarus (talk) 16:07, 23 February 2017 (UTC)
- Totally out of time this afternoon, have to rush. Will try to get back to you later. BTW it was The DJ who made the nice javascript :-) I only tweaked it, when it stopped working. Jheald (talk) 16:18, 23 February 2017 (UTC)
- @Molarus: Updated. Here's the old query for Category:Portrait paintings by Vincent van Gogh: tinyurl.com/zgc8db9 and here's a new one tinyurl.com/z49wf72. Let me know which you prefer -- the full commons category URL is quite long, and less readable, but it's good to have the clickable functionality. As for languages, I've put in "en, fr, it, de, nl, es" as a start (a bit western-Eurocentric, yes). Ideally one would draw from the user's own language-fallback preferences on Commons, but I'm not sure how one accesses that. Anyway, a first step. Jheald (talk) 12:20, 24 February 2017 (UTC)
- I think there is "Category:" missing in the commonscat link, because it links to a commons page that does not exist.
- There is this (Special:ApiSandbox#action=query&format=json&meta=userinfo&uiprop=acceptlang) information in commons, that could be read by Mediawiki API. For Molarus, the data is that I accept de, en-us and en. But I do not know how to get that information into the SPARQL-query. By the way, looking into the APISandbox is the way I have learned to write my javascript/jquery tool d:Wikidata:Tools/User_scripts#The_Brown_Tool, some years ago. --Molarus (talk) 19:20, 24 February 2017 (UTC)
- Thanks for catching the "Category:" mistake - I thought I'd checked it!
- The Mediawiki API looks interesting, but I am not sure if I can access it from Lua, to get the information into a template. Will have to think more about this in due course, after I've had some sleep. Jheald (talk) 01:11, 25 February 2017 (UTC)
- I have written in the last hours a small javascript tool that reads data by SPARQL: d:User:Molarus/SPARQL.js. If you run this script in preview mode in Wikidata (or insert it into your commons.js), you will see at the top right side of the screen the text: "SPARQL Query". Clicking that text with the mouse will call a SPARQL-query and print the first two labels on the screen. I have used the Firefox console for developing and with the console it is possible to see what is the whole result of getJSON (it´s more then just the labels and the item numbers). I think, the next step would be to read "Category:Paintings by Vincent van Gogh" and get the arguments of the "Category contains"-template, put them into the SPARQL-query and print the result in the top right corner of the screen. This way, commons could get a SPARQL/javascript tool. I know, a javascript tool is not what you want. Off topic: I have read the front side of this page and the discussion page, maybe this could turn into a commons / wikidata searching tool (idea 1)? You may know, that I spend my time on Wikidata with cycling. Maybe this js-script could be useful for cycling too. Maybe something like a virtual commons category for riders of a cycling team of the year xxxx, enhanced with WD data (idea 2). --Molarus (talk) 06:12, 25 February 2017 (UTC)
- A little update on my attempts at internationalisation. I had hoped to internationalise the search returns by adding
{{int:lang}}
to the start of the labelling preferences. However, at the moment this doesn't work because WDQS seems to prioritise a language by when it appears last in that list (if it appears more than once), rather than when it appears first. So we get tinyurl.com/j4wkywb, rather than tinyurl.com/j4jbcpt (the latter has a few results in French). - According to Daniel Kinzler on the mailing list yesterday, it seems it may not be possible to specify a full set of language preferences (at, not without Javascript) as MediaWiki serves cached versions of each page, and only has the capacity to cache one version for each language. Of course, with a Javascript gadget one could perhaps over-rule the language fall-back sequence of any query to whatever one wanted by dynamically rewriting the page at read-time, which would be interesting but quite a hack.
- One other relevant ticket that's open is to be able to specify to ultimately fall back to any language that's available. However it looks as if this ticket may be stalled while people wonder whether any particular language should be indicated as the default for any item. Jheald (talk) 17:37, 28 February 2017 (UTC)
- A little update on my attempts at internationalisation. I had hoped to internationalise the search returns by adding
- I have written in the last hours a small javascript tool that reads data by SPARQL: d:User:Molarus/SPARQL.js. If you run this script in preview mode in Wikidata (or insert it into your commons.js), you will see at the top right side of the screen the text: "SPARQL Query". Clicking that text with the mouse will call a SPARQL-query and print the first two labels on the screen. I have used the Firefox console for developing and with the console it is possible to see what is the whole result of getJSON (it´s more then just the labels and the item numbers). I think, the next step would be to read "Category:Paintings by Vincent van Gogh" and get the arguments of the "Category contains"-template, put them into the SPARQL-query and print the result in the top right corner of the screen. This way, commons could get a SPARQL/javascript tool. I know, a javascript tool is not what you want. Off topic: I have read the front side of this page and the discussion page, maybe this could turn into a commons / wikidata searching tool (idea 1)? You may know, that I spend my time on Wikidata with cycling. Maybe this js-script could be useful for cycling too. Maybe something like a virtual commons category for riders of a cycling team of the year xxxx, enhanced with WD data (idea 2). --Molarus (talk) 06:12, 25 February 2017 (UTC)
- @Molarus: Updated. Here's the old query for Category:Portrait paintings by Vincent van Gogh: tinyurl.com/zgc8db9 and here's a new one tinyurl.com/z49wf72. Let me know which you prefer -- the full commons category URL is quite long, and less readable, but it's good to have the clickable functionality. As for languages, I've put in "en, fr, it, de, nl, es" as a start (a bit western-Eurocentric, yes). Ideally one would draw from the user's own language-fallback preferences on Commons, but I'm not sure how one accesses that. Anyway, a first step. Jheald (talk) 12:20, 24 February 2017 (UTC)
- Totally out of time this afternoon, have to rush. Will try to get back to you later. BTW it was The DJ who made the nice javascript :-) I only tweaked it, when it stopped working. Jheald (talk) 16:18, 23 February 2017 (UTC)
- I have seen, that you have experience with javascript. At the moment, I´m inserting Wikipedia sorting into a lua module (see here, the arrows in the table header are done by inserting class="sortable" into the code. I will add more code, for example, for sorting date right). Maybe we could have some javascript similar to this sorting code for commons too? My experience with javascript is limited, therefore I can´t say what is possible. --Molarus (talk) 16:07, 23 February 2017 (UTC)
- @Molarus: You're right, I should include more language fallbacks. (And the template itself, as well as the query, should ultimately be internationalised. But this was a first stab. As to commonscat, what I was hoping was this might get fixed in WDQS, for them to hold a URL as well as a string for identifiers. Which there's certainly a long-standing ticket for, to make the identifiers work as linked data. So I hope that will fix it eventually. For the moment, it's the choice between a string and a URL, but you're right, until identifier links get sorted out, we should probably go for the URL. Jheald (talk) 15:49, 23 February 2017 (UTC)
Maybe there is a way to decide which commons category should have this template. A start could be items with P31 Q3305213 (paintings) and a P18 property. I do not know if it is possible to rank a list of commons categories witch have the most such items. One way could be to include the Commons category (P373) into the SPARQL query. Such a SPARQL-query could be used to tell a bot where to insert this template, because that can´t be done by hand for thousands or even millions of categories. --Molarus (talk) 20:25, 28 February 2017 (UTC)
- TL:DR, just notice the edits and left this on Jheald's talk page, this seems to be a better location. To get from Category -> query without the query :
- Category:Grade I listed buildings in Bedfordshire -> Category:Grade I listed buildings in Bedfordshire (Q8497784) -> list related to category (P1753) -> Grade I listed buildings in Bedfordshire (Q5591762) which has: the following data in is a list of (P360) :
- Instance of (P31) -> building (Q41176)
- located in the administrative territorial entity (P131) -> Bedfordshire (Q23143)
- heritage status (P1435) -> Grade I listed building (Q15700818)
which looks an awful lot like the "wdt:P131+ wd:Q23143 ; wdt:P1435 wd:Q15700818 ; wdt:P31?/wdt:P279* wd:Q41176" you have on the category. With a bit of LUA magic you don't have to put a query on every category. That would be awesome. Multichill (talk) 15:38, 23 February 2017 (UTC)
- @Multichill: Yep, very much inspired by P360, and what Magnus can do with Reasonator. BUT: In most cases, a category may not have a link to Wikidata at all; may not link to another category; and the category may not have a corresponding list.
- Therefore, what seems a better plan is to put the information on the category -- where it can be *systematically* (Also this allows a little more flexibility than P360 with say "19th century person").
- Ultimately, with luck Commons categories will get Commons wikibase pages, and that would be the place to move the information.
- But in the meantime, there may be a lot we can harvest from P360s and "Category combines" on Wikidata, that could be used to automatically write some of these SPARQL fragments. Jheald (talk) 15:57, 23 February 2017 (UTC)
- I'm not a big fan of duplication, but I do like to offer flexibility. The way I describe it should be the default behaviour if the template is empty or has a wikidata=Q123 linking it to the list. Users can always override it with a query to improve it. Multichill (talk) 16:02, 23 February 2017 (UTC)
- @Multichill: You make a fair point. Flexibility is a good thing. But I think for me the issue is that I would see the SPARQL fragment as potentially having more use than just in the query. Having it explicitly in the wikitext means that tools can load the raw page, then send a query straight to WDQS -- eg for a photo that represents something with a particular Q-number, a tool can ASK does it belong in this category or not, something which could be a real help for auto- (or rather, machine-assisted) categorisation. It also means the data is there, eg as a basis for a broader statistical analysis of the sorts of refinements that happen in going from categories to sub-categories. Another external tool might see that the commonscat for an item lives in this category, when according to the data in Wikidata it shouldn't qualify -- inferring that there is some fact that Wikidata is missing. All of this is easier if it's consistently the same system (ie a sparql statement) for all categories. Otherwise you would be having to create additional LUA magic not just for the query in the template, but again and again (in language after language) for whatever other tool might want to use the category spec. At least, that's how it seems to me. Jheald (talk) 23:51, 23 February 2017 (UTC)
- I'm not a big fan of duplication, but I do like to offer flexibility. The way I describe it should be the default behaviour if the template is empty or has a wikidata=Q123 linking it to the list. Users can always override it with a query to improve it. Multichill (talk) 16:02, 23 February 2017 (UTC)
@Jheald: Thank you so much for doing this initial work! I have been following your conversation on the Village pump, and in other places, and this looks exactly like one of the questions I have been bringing up with the folks who are beginning to plan on the Structured Commons work: categories are a super important part of both the way in which most folks interface with our content and as contributors to the projects. We are still building the team and funding doesn't kick in for another month, so we aren't quite ready to integrate your model into a larger workplan: but based on the conversations I have had and what I see so far: your template is a super good way to start understanding the categories functionally within the project.
I hope to have a more thorough update out to various channels, including our newsletter list in a week or two. However, the high-level focus in the near term is on several major backend changes, already in development by the Wikidata team, and soon with more support from WMF. This means, that in the meantime, understanding how Commons categories map to various ways of thinking about Wikidata concepts (as intersections, as non-Wikidata definable groupings, or as exact matches to Wikidata items) is really important. Any prework that helps us evaluate that data, and helps us understand how the categories function, is super helpful. Once the team is in place, I will definitely have them work with you more closely on this. Astinson (WMF) (talk) 21:58, 27 February 2017 (UTC)
I have done quite a lot of categorisation on commons and I think some of what I do would not fit well with some of these wikidata plans. I use categories to bring together like minded pictures and to break up large categories - especially when you have over 200 images in one category. Many of the buildings that I create categories for do turn out to be listed elsewhere as a notable building, or be the building associated with a notable organisation. But that is often a coincidence. Of course it is useful when you can get a one to one mapping between a commons category and a wikipedia article and indeed a Wikidata entry - but any of those three could be first. What I'm trying to do is to organise the files that we have on commons. Sometimes that involves creating categories for the things I'm interested in, sometimes you have to create categories for the things you aren't interested in in order to bring the interesting ones together in the residue. No objection to having some of our data become more structured, but if you want to replace the category system with something radically different it would be an idea to discuss the proposal here on commons and spell out how your proposed new system would differ. WereSpielChequers (talk) 12:09, 7 September 2017 (UTC)
Multilinguism
[edit]Commons is supposed to be a multilingual project. This statement is supposed to be true for metadata of files, as well as community discussions. However, and this page is yet another proof, discussions are actually happening in English, and expressing oneself in another langage means not being understood at best, being seen as rude at worst. With structured data, there is an oppportunity to have some kind of multilingual discussions. Of course, complex issues will still need natural language (and thus, English) but some discussions, included votes (QI, FP, and simple deletion requests) could be multilingual by using a list of predifined concepts. Léna (talk) 11:27, 7 November 2016 (UTC)
Semantic annotations
[edit]The property depicts (P180) is useful for this project, but it is also very rough. It would be nice to be able to annotate the image, similarly to Commons:Image annotations, but in a structured way instead of free text. One should also probably take a look at the W3C standard Web Annotation Data Model to make sure that open standards are being implemented. Ainali (talk) 19:37, 8 November 2016 (UTC)
- This is already possible in Wikidata, using "relative position within image" (P2677). But yes, compliance with W3C standards is beneficial. Andy Mabbett (talk) 13:00, 6 February 2017 (UTC)
Thank you so much for participating!
[edit]Hello all! Your feedback and conversation at this stage in developing this potential expedited work on Structured Data on Commons is greatly appreciated.
The feedback both highlighted new challenges and helped us examine the challenges we anticipated, as well as bringing up new ideas and methods for thinking about solutions to those challenges. We are going to make sure that the team working on this project, whether or not we get the additional funding, see this conversation page; continued feedback or conversation here, will greatly strengthen their work, and let us know that you are interested in getting from that work. We will make sure to provide an update as soon as we have more substantial information about the project's potential funding source.
In the short term, we recommend also participating in the 2016 Community Wishlist Survey on Meta. The survey helps scope out a number of project ideas, requests, and other technical needs for various Wikimedia Communities. Even though the Wishlist and the Community Tech team may or may not be able to expedite work on this core technological infrastructure needed for Structured Commons, identifying specific technical needs related to it as part of the Wishlist, will help ensure that more of the developer community understand community needs on Commons.
Pinging folks that participated: @Ainali, Yann, El Grafo, MichaelMaggs, and Léna: @Molarus, Jarekt, Bluerasberry, Colin, and Slowking4: @Susannaanas, Spinster, ChristianKl, Micru, and B25es: @Jane023, John Cummings, Steinsplitter, Fae, and Ghouston: @PKM: . Please feel free to ping more folks, if I missed them. Astinson (WMF) (talk) 16:46, 15 November 2016 (UTC)
About categories
[edit]Hi, I copy here a message elsewhere, but which concerns this project.
To me, the current category system is far to be optimal. It is a patchy hack due to MediaWiki limitations. Search is inconsistent, specially across several categories, which leads people to create micro-categories, and then conflicts over these. It is in English only, which is a major flaw on a multilingual website. This should be corrected, and a system based on a real database would certainly be better. The editing interface needs to keep the ease of editing we have now, and this is a challenge. Regards, Yann (talk) 12:20, 7 September 2017 (UTC)