Commons:Bots/Requests/Rybecbot
Operator: Rybec (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)
Bot's tasks for which permission is being sought: re-uploading photos which have been cropped to remove watermarks and editing accompanying text to indicate the watermark has been removed
Automatic or manually assisted: automatic, lightly supervised
Edit type (e.g. Continuous, daily, one time run): one time run
Maximum edit rate (e.g. edits per minute): 2 when uploading files, 4 when changing text
Bot flag requested: (Y/N): Y
Programming language(s): Python (the Pywikipediabot upload.py script with minor changes and the replace.py script with no changes)
Rybec (talk) 05:29, 22 March 2013 (UTC)
Discussion
- Badly needed. Please execute a batch of runs as an example. --Foroa (talk) 06:54, 22 March 2013 (UTC)
- I'm having trouble doing the test edits because the newly-created bot account doesn't have reupload permission (just reupload-own)--I checked here. Rybec (talk) 12:18, 22 March 2013 (UTC)
- Yes it's very nasty for new bot accounts. Should work now! -- Rillke(q?) 18:10, 22 March 2013 (UTC)
- Thank you! I've done the test edits. I only prepared 24 files to replace instead of the suggested 30 to 50. Rybec (talk) 20:42, 22 March 2013 (UTC)
- Yes it's very nasty for new bot accounts. Should work now! -- Rillke(q?) 18:10, 22 March 2013 (UTC)
Please do not continue until the bot preserves Metadata and please repair this for all edits done so far by the bot. Example where data is lost at File:047-1211 Enschede 125.JPG (before after processing). Please also include a meaningful upload/edit summary (like "image cropped to remove watermark"). Thank you! -- Rillke(q?) 21:55, 22 March 2013 (UTC)
Question Which software do you use for cropping the file? Is it lossless? -- Rillke(q?) 22:02, 22 March 2013 (UTC)
- Thanks for the review! I used jpegtran, which is lossless. I hadn't noticed the problem with the EXIF data; I don't know why that happened. Rybec (talk) 22:19, 22 March 2013 (UTC)
- ExifTool is, for example capable copying (nearly) all metadata from the original file to the edited one. This way you could ensure they are never lost. -- Rillke(q?) 10:38, 23 March 2013 (UTC)
- I've reverted my test/example edits. Rybec (talk) 22:34, 22 March 2013 (UTC)
- Thank you. -- Rillke(q?) 10:38, 23 March 2013 (UTC)
- Great demand exists for automated removal of watermarks, and this pioneering bot is brilliant :D I would say the exif is less important at the moment especially if the information is still made available with the older version. If it inspires people to make exif copying bots, all the better. I believe it is not long before we see watermark removal bots that mend the picture rather than crop, but for the time being, in these chaotic times where trigger fingers are blocking people for good contributions, Rybec's bot would help bring some badly needed relief and order. Rybec certainly seems responsive, capable, and I recommend flagging his bot forthwith ! I can't see maintenance and improvements being a problem. Penyulap ☏ 05:59, 23 March 2013 (UTC)
- In this case they contained copyright and camera information and it is quite bad if they aren't visible any more at the file description page. -- Rillke(q?) 10:38, 23 March 2013 (UTC)
- Question: Does your bot automatically detect the watermarks or are you manually instructing what to crop? -- Rillke(q?) 10:38, 23 March 2013 (UTC)
- I was just using the identify command from Imagemagick to get the pixel size of downloaded images, subtracting 138 from the height, scripting jpegtran to crop to that size, and the only function of the bot is the re-uploading. I agree that the metadata is important. The problem was that I didn't use the "copy all" option to jpegtran. I've manually uploaded to File:1210_Turnhout_029.JPG one example of a file cropped with the "copy all" option. Its EXIF data is preserved. I also learned how to do an edit summary with the script: [1]. I've started another test run, with the metadata and the edit summaries. Rybec (talk) 11:51, 25 March 2013 (UTC)
- Someone pointed out the need to remove {{watermark}} and the category Category:Uploads by Microtoerisme with watermarks. I was thinking that could be done with VisualFileChange.js. Rybec (talk) 12:47, 25 March 2013 (UTC)
- Not remove "watermark", but change it to "watermark removed". ("watermark removed" is appropriate for these uploads) – JBarta (talk) 13:01, 25 March 2013 (UTC)
- I've done a second test run, changing {{watermark}} to {{watermark removed}} and removing [[Category:Uploads by Microtoerisme with watermarks]] as described by Jbarta. The files can be seen at Special:ListFiles/Rybecbot. I've changed my request to add the use of the standard replace.py script for the textual changes. Rybec (talk) 21:41, 28 March 2013 (UTC)
- Results look good to me. --VanBuren (talk) 21:52, 1 April 2013 (UTC)
- I've done a second test run, changing {{watermark}} to {{watermark removed}} and removing [[Category:Uploads by Microtoerisme with watermarks]] as described by Jbarta. The files can be seen at Special:ListFiles/Rybecbot. I've changed my request to add the use of the standard replace.py script for the textual changes. Rybec (talk) 21:41, 28 March 2013 (UTC)
- Not remove "watermark", but change it to "watermark removed". ("watermark removed" is appropriate for these uploads) – JBarta (talk) 13:01, 25 March 2013 (UTC)
- Suggestion - Howdy. I suggest using {{Attribution metadata from licensed image}} instead of {{Watermark removed}} because the latter redirects to the former. Also, I suggest checking to see if one of those templates is already on the page. When you made this edit, the template was already on the page.--Rockfang (talk) 10:53, 6 April 2013 (UTC)
- I wasn't even aware that "Attribution metadata from licensed image" existed. Might I suggest that "watermark removed" is more used, more intuitive and easier to spot than the other? The resulting template on the image description page is the same either way. Just a thought. – JBarta (talk) 11:36, 6 April 2013 (UTC)
- Assuming Jarry1250's Toolserver Template transclusion count tool is correct, {{Attribution metadata from licensed image}} is transcluded 6,192 times while {{Watermark removed}} is transcluded 3,422 times. If an image license doesn't require attribution, then {{Metadata from image}} can be used.--Rockfang (talk) 13:35, 6 April 2013 (UTC)
- I wasn't even aware that "Attribution metadata from licensed image" existed. Might I suggest that "watermark removed" is more used, more intuitive and easier to spot than the other? The resulting template on the image description page is the same either way. Just a thought. – JBarta (talk) 11:36, 6 April 2013 (UTC)
- The mistaken edit found by Rockfang is one I did manually. When doing the textual replacements, the bot will look for the specific text "{{watermark}}" (by which I mean, enclosed in curly brackets) and change it. Only if {{watermark}} appeared twice already, or together with {{watermark removed}} would it make the same mistake I did. If it encounters "{{watermark removed}}" it does not do any replacement. In other words, it's not inserting {{watermark removed}} but rather changing {{watermark}} to {{watermark removed}}. For the text replacement task I want to use the standard replace.py script from pywikipedia. I don't especially mind using {{Attribution metadata from licensed image}} rather than {{Watermark removed}}, although the latter is more succinct. On Wikipedia, using redirects in a similar manner is considered okay. Is it the same here? Is the redirect likely to be deleted? If not, it seems to me like a matter of indifference. Rybec (talk) 22:05, 6 April 2013 (UTC)
- The redirect is unlikely to be deleted, and even if it was, all uses should be replaced in advance. I don't think this is a major issue whichever way you do it. --99of9 (talk) 15:00, 20 June 2013 (UTC)
Unless there are further comments, I propose that this request be approved. --99of9 (talk) 15:00, 20 June 2013 (UTC)
- Approved and flagged. --99of9 (talk) 10:41, 11 July 2013 (UTC)