Commons:Bots/Requests/Reinheitsgebot
Operator: Magnus Manske (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)
Bot's tasks for which permission is being sought: Adding statements (depicts (P180), initially) to MediaInfo items, based on reasonably reliable indication (e.g. image (P18) on Wikidata and page image on dewiki).
Automatic or manually assisted: automatic
Edit type (e.g. Continuous, daily, one time run): batches
Maximum edit rate (e.g. edits per minute): Not set, but limited by dependence on API requests before each edit. Rate limit can be added if rate is too high.
Bot flag requested: (Y/N): Y
Programming language(s): Rust
Magnus Manske (talk) 14:50, 25 September 2019 (UTC)
Discussion
Please do not add statements based on usage in articles. This would provide to many bad statements. But based on usage as image (P18) is okay. --GPSLeo (talk) 15:39, 25 September 2019 (UTC)
- Comment Why not to use Commons categories? Of course for very precise one like building/structure or biological species. It'll be also reasonable to organize project around such activities, since Prominent flag could not be set automatically as well as because of qualifiers like sex for biological species. I'd like to see qualifiers scheme for buildings/structures. Please also see Commons:Bots/Work requests/Archive 15#List of Wikidata items without image for opposite direction task. --EugeneZelenko (talk) 14:42, 27 September 2019 (UTC)
- This is all feasible. There are many ways to generate structured data from existing information on Commons. This bot will be quite generic. --Magnus Manske (talk) 12:58, 26 October 2019 (UTC)
- Comment 48,415 test edits were completed before the unauthorized bot was blocked. I am concerned by the operator's disregard for procedure and unresponsiveness to issues with his code. — Jeff G. ツ please ping or talk to me 05:28, 30 September 2019 (UTC)
- Apologies, I didn't realize it was that many! --Magnus Manske (talk) 12:58, 26 October 2019 (UTC)
- In general makes sense: worth bringing over useful clearly "true" statements from Wikidata, Sadads (talk) 14:47, 30 September 2019 (UTC)
- Much information can also be inferred from Commons. The aim here is to make SDC and Wikidata both work for Commons, not really duplicating information already Wikidata. --Magnus Manske (talk) 12:58, 26 October 2019 (UTC)
- I have some mild objections to the bot user name or edit summaries. Normally we expect all bot edits to be clearly identifiable as such by a) including bot in the username or b) clarifying in the edit summaries, normally by adding "Bot: ...". As a German native, I have to say that a) is not fulfilled. Reinheisgebot is a normal German word and does not clearly identify the user as a bot. Please consider to satisfy b). --Schlurcher (talk) 17:20, 8 October 2019 (UTC)
- As a German native, I thought it was a nice pun. Just to be clear, is your objection denying the bot flag that the "b" is not uppercease? --Magnus Manske (talk) 12:58, 26 October 2019 (UTC)
- This, or simply throw in the word bot somewhere in the edit summaries to make it clear that this are automatic edits. I have no opjection to the bot tasks. --Schlurcher (talk) 20:05, 27 October 2019 (UTC)
- @Krd: seems that folks seem supportive of the move, and your concerns have been responded to. Sadads (talk) 19:34, 4 November 2019 (UTC)
- This, or simply throw in the word bot somewhere in the edit summaries to make it clear that this are automatic edits. I have no opjection to the bot tasks. --Schlurcher (talk) 20:05, 27 October 2019 (UTC)
- Support - I only see procedural complaints, that have nothing to do with the bot or the output created. Magnus is the best tool composer we could wish, without his numerous tools Wikimedia would still be a website running on a flashdrive in someone's basement. Edoderoo (talk) 06:56, 25 October 2019 (UTC)
- Thanks, though it might be a bit over the top ;-) --Magnus Manske (talk) 12:58, 26 October 2019 (UTC)
- Sounds like the only potential issue is with the summary and or username?
- Summary sounds like a non issue and I'm sure the summary could be altered with "BOT EDIT" if requested, or something similar.
- As for the name, the bot will have the bot flag, so clearly identifiable IMO, and also it does have bot in the name.
- I'd rather not see this sit here for another month for these 2 small reasons.
- ·addshore· talk to me! 13:20, 26 October 2019 (UTC)
- Source of data is another matter not decided yet. --EugeneZelenko (talk) 14:28, 26 October 2019 (UTC)
- I see the main comment in that area is "Please do not add statements based on usage in articles.". From the task description I don't think that is being done? Perhaps @Magnus Manske: could clarify that? ·addshore· talk to me! 14:46, 26 October 2019 (UTC)
- I think will be reasonable to have explicit list of source (and qualifiers?) as result of this discussion. --EugeneZelenko (talk) 15:26, 27 October 2019 (UTC)
- I'll add references and qualifiers where possible, and also specify the current "mode" in the edit summary (I can also add "bot edit", incase the username and bot flag are too subtle). But for some things, there is no good machine-readble source; for example, the next task on the list is to add the Geograph IDs from the text description as structured data. I can put that into the summary though. --Magnus Manske (talk) 11:02, 30 October 2019 (UTC)
- I don't think that references like Imported from have too much sense. Qualifiers could be extracted from very populated categories, like male/female organism for species or architectural details (example: Category:Cathédrale Notre-Dame de Paris). --EugeneZelenko (talk) 14:49, 30 October 2019 (UTC)
- Support Checked a few edits and looks fine to me. --Steinsplitter (talk) 13:31, 26 October 2019 (UTC)
@99of9, Ellin Beltz, EugeneZelenko, Jameslwoodward, JuTa, Krd, and Odder: Is this sufficient now? --Magnus Manske (talk) 12:44, 6 November 2019 (UTC)
- I was waiting for a reply to EugeneZelenko's last comment. If there is nothing left open, I think it can be approved. --Krd 12:48, 6 November 2019 (UTC)
- It's OK with me. Thank you, Magnus, for all of your work here. . Jim . . . (Jameslwoodward) (talk to me) 13:53, 6 November 2019 (UTC)
- Please answer open question. There is still no clear definition what bot will exactly do. --EugeneZelenko (talk) 15:33, 6 November 2019 (UTC)
- Do not disagree with request, but do not understand its utility. When clarity is achieved, I can give opinion. Please answer Eugene's question, too. Ellin Beltz (talk) 23:14, 7 November 2019 (UTC)
- Apologies, I thought I did answer it. The bot will add (and potentially edit) structured data for files. Some of it will be from Commons-external sources, such as Wikidata (not mere data dumping!), some of it will be from the Commons file description. For example, the next batch I work on is to add the {{Geograph}} IDs as structured data. If/when consensus is reached (and only then), the bot may also remove the wikitext if the corresponding SD exists, and is automatically displayed via template/module, but that's for the future. --Magnus Manske (talk) 09:29, 8 November 2019 (UTC)
- IDs are fine. But please check situation when ID is set for complex object consistence from parts and parts are distinguished in both Wikidata and Commons. I think request could be approved for IDs, just not hold it. Other kinds of data deductions should be discusses in separate requests. --EugeneZelenko (talk) 15:21, 8 November 2019 (UTC)
- Please also make sure that IDs relate to specific objects (buildings, statues, etc), but not districts. In later case objects may deserve separate Wikidata item and may have county/municipal notability. --EugeneZelenko (talk) 15:07, 11 November 2019 (UTC)
- @Magnus Manske: Note that {{Geograph}} is unreliable. Uploaders frequently put in the wrong image number, and I'm pretty confident I haven't caught all of them. Adding claims of geograph.org.uk image ID (P7384) should probably be restricted to cases where you have other evidence that the image is identical to the one on Geograph (e.g. because it was uploaded by GeographBot or Geograph Update Bot). That would cover the vast majority of cases anyway. --bjh21 (talk) 16:27, 11 November 2019 (UTC)
- I will take that into consideration. OTOH, importing all stated IDs into Structured Data will not make things worse than they are, but add our ability (once live SPARQL is available) to quickly find duplicate IDs etc. I think starting with the user names you mentioned is a good approach though. --Magnus Manske (talk) 13:17, 15 November 2019 (UTC)
- Apologies, I thought I did answer it. The bot will add (and potentially edit) structured data for files. Some of it will be from Commons-external sources, such as Wikidata (not mere data dumping!), some of it will be from the Commons file description. For example, the next batch I work on is to add the {{Geograph}} IDs as structured data. If/when consensus is reached (and only then), the bot may also remove the wikitext if the corresponding SD exists, and is automatically displayed via template/module, but that's for the future. --Magnus Manske (talk) 09:29, 8 November 2019 (UTC)
- @Krd: While there is still some discussion needed for approval, can the bot account be unblocked? An initial block to stop the original bot trial after making nearly 50,000 edits was appropriate. I don't see a need for the block to be maintained at this point. ~riley (talk) 18:21, 8 November 2019 (UTC)
- Of course. --Krd 21:06, 8 November 2019 (UTC)
- BTW, the bot name en:Reinheitsgebot / de:Reinheitsgebot appears not the best choice to me. It's a bit funny, but in the end it is misleading. --Krd 21:08, 8 November 2019 (UTC)
- Of course. --Krd 21:06, 8 November 2019 (UTC)
- Support highly trusted user who operates multiple bots. Nothing wrong with the name, bot already did over 13M edits on Wikidata. The task is to add structured data to files. That looks like a fine task to me. Multichill (talk) 19:22, 9 November 2019 (UTC)
- Task definition should be specific in data sources and this task is limited to IDs. --EugeneZelenko (talk) 15:07, 11 November 2019 (UTC)
- What's wrong with the task definition "add high-quality Structured Data"? Why tie my hands unnecessarily? --Magnus Manske (talk) 13:17, 15 November 2019 (UTC)
- Because of data sources that could be controvertible or incorrect. Please also see suggestions on this request - discussions on such matters are always useful. --EugeneZelenko (talk) 15:40, 15 November 2019 (UTC)
- I agree that discussions are useful, but in this special case I don't see which point is unclear. Can anybody please say again in different words what is missing? --Krd 08:01, 1 December 2019 (UTC)
- I think agreement on scope of operation is still missing. I would like to reiterate my older suggestion: I think request could be approved for IDs, just not hold it. Other kinds of data deductions should be discusses in separate requests. --EugeneZelenko (talk) 15:17, 1 December 2019 (UTC)
- Pointless bureaucracy, just approve it as is. Multichill (talk) 22:35, 4 December 2019 (UTC)
- Should I remind you that we had differnet point of view about application of IDs for areas? --EugeneZelenko (talk) 01:08, 5 December 2019 (UTC)
- Pointless bureaucracy, just approve it as is. Multichill (talk) 22:35, 4 December 2019 (UTC)
- I agree with EugeneZelenko. It might be sensible to agree on a few IDs first and then reopen this or another request again when we all have gained some experience on what high-quality structured data refers to and what not. --Schlurcher (talk) 10:28, 5 December 2019 (UTC)
- @Magnus Manske: Any update? --Krd 14:58, 28 December 2019 (UTC)
- @Magnus Manske: Final call. --Krd 15:48, 16 January 2020 (UTC)
- @Magnus Manske: Any update? --Krd 14:58, 28 December 2019 (UTC)
- I think agreement on scope of operation is still missing. I would like to reiterate my older suggestion: I think request could be approved for IDs, just not hold it. Other kinds of data deductions should be discusses in separate requests. --EugeneZelenko (talk) 15:17, 1 December 2019 (UTC)
- I agree that discussions are useful, but in this special case I don't see which point is unclear. Can anybody please say again in different words what is missing? --Krd 08:01, 1 December 2019 (UTC)
- Because of data sources that could be controvertible or incorrect. Please also see suggestions on this request - discussions on such matters are always useful. --EugeneZelenko (talk) 15:40, 15 November 2019 (UTC)
- What's wrong with the task definition "add high-quality Structured Data"? Why tie my hands unnecessarily? --Magnus Manske (talk) 13:17, 15 November 2019 (UTC)
- Task definition should be specific in data sources and this task is limited to IDs. --EugeneZelenko (talk) 15:07, 11 November 2019 (UTC)
Closing as stale, feel free to reopen at any time. --Krd 06:29, 25 January 2020 (UTC)