Commons:Bots/Requests/YouTubeReviewBot
Operator: Eatcha (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)
Bot's tasks for which permission is being sought:
Review files from YouTube and Vimeo, see category Category:License_review_needed_(video). List of 13K (backlog) files available at here
I will only review passed files, will not mark the failed for deletion. The failed reviews can be reviewed by humans. This also prevents accidental mass deletion request if youtube changes their site, I don't use the YouTube Data API as it's not working "Toollabs IPs are banned due to mass downloading using Video2commons". I scrape the website to review files.
Automatic or manually assisted: Automatic
Edit type (e.g. Continuous, daily, one time run): Continuous, daily
Maximum edit rate (e.g. edits per minute): not more than 6 per minute
Bot flag requested: (Y/N): Yes
Programming language(s): Python3
Eatcha (talk) 14:13, 1 December 2019 (UTC)
Discussion
Need license reviewer rights to review files! There is an abuse filter to prevent license reviewes by non-license reviewers. Thanks -- Eatcha (talk) 14:19, 1 December 2019 (UTC)
Bot's LR request available at Commons:License_review/Requests#YouTubeReviewBot -- Eatcha (talk) 14:40, 1 December 2019 (UTC)
- Are you implying that the bot will run on Toolforge? And is the source code available? Masum Reza📞 14:51, 1 December 2019 (UTC)
- Masumrezarock YES, Source code is available at tools:ytrb ON FORGE. -- Eatcha (talk) 14:58, 1 December 2019 (UTC)
- Sounds like a good plan to me. Where should I Support this BRFA? Masum Reza📞 15:02, 1 December 2019 (UTC)
- Masumrezarock Thanks, but it's a discussion supports don't count. Best, -- Eatcha (talk) 15:05, 1 December 2019 (UTC)
- Sounds like a good plan to me. Where should I Support this BRFA? Masum Reza📞 15:02, 1 December 2019 (UTC)
- Masumrezarock YES, Source code is available at tools:ytrb ON FORGE. -- Eatcha (talk) 14:58, 1 December 2019 (UTC)
- License check is great, but I think human reviews is still needed because of Commons:Derivative works, so we split the review process of bot assisted/human parts. --EugeneZelenko (talk) 15:14, 1 December 2019 (UTC)
- Maybe it's similar to User:FlickreviewR 2, examples, where the bot passed Derivative works 1, 2, 3 and many others. We don't have Internet Archive bot, the review should just act a proof that the video had a creative commons tag if the license is changed by the uploader. It's impossible (as of now) for the bot to detect derivative work in videos, it's possible for images but hard AFAIK. Best -- Eatcha (talk) 15:39, 1 December 2019 (UTC)
- I didn't suggest detection of Commons:Derivative works, my point was to split review task between bot and humans. --EugeneZelenko (talk) 15:08, 2 December 2019 (UTC)
- EugeneZelenko How should I(or anyone else) split review task between bot and humans ? Can you please give one example as a hint ? Thanks-- Eatcha (talk) 15:20, 2 December 2019 (UTC)
- I think we should expand template to allow two reviews: one for license by bot and other for humans for other issues. --EugeneZelenko (talk) 15:24, 2 December 2019 (UTC)
- EugeneZelenko What If I add a new template that would state "This file had CC-BY-SA-3.0 tag on MM-DD-YYYY, which was confirmed by Youtubereviewbot. This file should not be deleted if the license has changed in the mean time, but if it's a derivative work or copy-right violation it should be deleted." -- Eatcha (talk) 15:34, 2 December 2019 (UTC)
- Template should clearly state that bot checks only license and further human review was needed or performed by somebody. --EugeneZelenko (talk) 15:37, 2 December 2019 (UTC)
- EugeneZelenko What If I add a new template that would state "This file had CC-BY-SA-3.0 tag on MM-DD-YYYY, which was confirmed by Youtubereviewbot. This file should not be deleted if the license has changed in the mean time, but if it's a derivative work or copy-right violation it should be deleted." -- Eatcha (talk) 15:34, 2 December 2019 (UTC)
- I think we should expand template to allow two reviews: one for license by bot and other for humans for other issues. --EugeneZelenko (talk) 15:24, 2 December 2019 (UTC)
- EugeneZelenko How should I(or anyone else) split review task between bot and humans ? Can you please give one example as a hint ? Thanks-- Eatcha (talk) 15:20, 2 December 2019 (UTC)
- I didn't suggest detection of Commons:Derivative works, my point was to split review task between bot and humans. --EugeneZelenko (talk) 15:08, 2 December 2019 (UTC)
- Maybe it's similar to User:FlickreviewR 2, examples, where the bot passed Derivative works 1, 2, 3 and many others. We don't have Internet Archive bot, the review should just act a proof that the video had a creative commons tag if the license is changed by the uploader. It's impossible (as of now) for the bot to detect derivative work in videos, it's possible for images but hard AFAIK. Best -- Eatcha (talk) 15:39, 1 December 2019 (UTC)
- Info see filter_logs for the hits. -- Eatcha (talk) 15:46, 1 December 2019 (UTC)
- I see you are accessing youtube.com/watch directly and not its API. How are you getting around phab:T236446? --Zhuyifei1999 (talk) 04:43, 2 December 2019 (UTC)
- @Zhuyifei AFAIU, YOUTUBE-DL use Youtube data API, google blocks API access *only*. I tried to use the data api in forge it returns the same error as v2c. But if you access the page data *normally* and parse the source you can... see https://repl.it/repls/DevotedFancyAutomaticvectorization . You may find https://github.com/TeamNewPipe/NewPipe/ useful -- Eatcha (talk) 05:54, 2 December 2019 (UTC)
- Interesting. So it is possible to download videos without using its data API... --Zhuyifei1999 (talk) 17:56, 2 December 2019 (UTC)
- @Zhuyifei AFAIU, YOUTUBE-DL use Youtube data API, google blocks API access *only*. I tried to use the data api in forge it returns the same error as v2c. But if you access the page data *normally* and parse the source you can... see https://repl.it/repls/DevotedFancyAutomaticvectorization . You may find https://github.com/TeamNewPipe/NewPipe/ useful -- Eatcha (talk) 05:54, 2 December 2019 (UTC)
- EugeneZelenko take a look on the following :
This file, which was originally posted to YouTube, was reviewed on 3 December 2019 by the automatic software YouTubeReviewBot, which confirmed that this video was available there under the stated Creative Commons license on that date. This file should not be deleted if the license has changed in the meantime. The Creative Commons license is irrevocable.
The bot only checks for the license, human review is still required to check if the video is a derivative work, has freedom of panorama related issues and other copyright problems that might be present in the video. Visit licensing for more information. If you are a license reviewer, you can review this file by manually appending | |
After human review |reviewer=Eatcha
was appended
This file, which was originally posted to YouTube, was reviewed on 3 December 2019 by the automatic software YouTubeReviewBot, which confirmed that this video was available there under the stated Creative Commons license on that date. This file should not be deleted if the license has changed in the meantime. The Creative Commons license is irrevocable.
This file was manually reviewed by reviewer Eatcha, who confirmed that this file is allowed on Wikimedia Commons. | |
Eatcha (talk) 03:29, 3 December 2019 (UTC)
- It would be reasonable to use links to Commons:Derivative works, Commons:Licensing, Commons:Freedom of panorama, etc. --EugeneZelenko (talk) 15:40, 3 December 2019 (UTC)
- Mind collapsing all the blank spaces a bit? The margin around the CC logo is huge --Zhuyifei1999 (talk) 15:47, 3 December 2019 (UTC)
- Eugene Zelenko &Zhu yifei is it ok now? -Eatcha (talk) 16:43, 3 December 2019 (UTC)
- Heh I like Special:Diff/378903780. the old version was too green, and have too little margin above the logo and beside the texts (on my screen the texts were touching the borders) --Zhuyifei1999 (talk) 17:28, 3 December 2019 (UTC)
- Are colors same as in other review templates? --EugeneZelenko (talk) 16:04, 4 December 2019 (UTC)
- Heh I like Special:Diff/378903780. the old version was too green, and have too little margin above the logo and beside the texts (on my screen the texts were touching the borders) --Zhuyifei1999 (talk) 17:28, 3 December 2019 (UTC)
- Eugene Zelenko &Zhu yifei is it ok now? -Eatcha (talk) 16:43, 3 December 2019 (UTC)
This image was originally posted to Flickr by Österreichisches Außenministerium at https://flickr.com/photos/88775815@N04/40647907102. It was reviewed on 5 December 2019 by FlickreviewR 2 and was confirmed to be licensed under the terms of the cc-by-2.0. |
Eatcha (talk) 05:10, 5 December 2019 (UTC)
Imo color of flickr reviews and YouTube reviews are same. --- It actually uses Template:LicenseReview/styles.css, so, yeah.. - Alexis Jazz ping plz 06:00, 5 December 2019 (UTC)
- @Eatcha: I messed with your comment again. Can you change it so {{ISOdate}} works properly? "December 3 2019" is nonsense in Dutch. (we say 3 december 2019) - Alexis Jazz ping plz 18:25, 3 December 2019 (UTC)
- Alexis Jazz will do that, unfortunately I was doing something else and the bot made more than 300 edits with US date format. It will require a clean up, I guess. -- Eatcha (talk) 18:44, 3 December 2019 (UTC)
- I fixed the date on all processed files, if anybody has a suggestion please tell me now, cleaning up 13 thousand edits is enough to kill my brain. --Eatcha (talk) 19:08, 3 December 2019 (UTC)
- @Eatcha: that's odd, I thought I fixed them? - Alexis Jazz ping plz 19:15, 3 December 2019 (UTC)
- Oh, suggestions: YouTube video title. Template accepts title= now. Also beware I changed human=/user= to reviewer=. - Alexis Jazz ping plz 19:16, 3 December 2019 (UTC)
- This file should not be deleted if the license has changed in the meantime. Do we need this bit in the template? This is true for any license that has been subject to a license review, whether by bot or human. Other review templates don't include such a statement. I'm also wondering if the template should address videos that are free but are licensed under the Standard YouTube License, like {{PD-FLGov}} works (File:Mayor Dyer on the Pulse Site Purchase.webm, for example). ƏXPLICIT 03:49, 4 December 2019 (UTC)
- I don't think you need to do through the process of {{YouTubeReview}} it was {{PD-FLGov}}. The purpose of {{YouTubeReview}} is to have a record that the licence marked on YouTube was once CC-BY --Zhuyifei1999 (talk) 03:56, 4 December 2019 (UTC)
- This file should not be deleted if the license has changed in the meantime. Do we need this bit in the template? This is true for any license that has been subject to a license review, whether by bot or human. Other review templates don't include such a statement. I'm also wondering if the template should address videos that are free but are licensed under the Standard YouTube License, like {{PD-FLGov}} works (File:Mayor Dyer on the Pulse Site Purchase.webm, for example). ƏXPLICIT 03:49, 4 December 2019 (UTC)
- I fixed the date on all processed files, if anybody has a suggestion please tell me now, cleaning up 13 thousand edits is enough to kill my brain. --Eatcha (talk) 19:08, 3 December 2019 (UTC)
- Alexis Jazz will do that, unfortunately I was doing something else and the bot made more than 300 edits with US date format. It will require a clean up, I guess. -- Eatcha (talk) 18:44, 3 December 2019 (UTC)
- I think we need a page like User:FlickreviewR/bad-authors for Youtube too. Hanooz 06:12, 6 December 2019 (UTC)
- This bot literally only checks whether the license on YouTube is CC-BY. It doesn't 'pass' the review all the way through like FlickreviewR does, and human review is still needed, so potentially the human reviewer might have a list of bad-authors in mind. I guess a list for the bot could be added (depends on whether Eatcha is willing to ;) ), but it probably won't be as useful as the Flickr one (for reference, neither the Picasa one nor the Panoramio had such a list, even though they were all-the-way pass).
- That said, do you have a list of YouTubers to pre-populate the page? --Zhuyifei1999 (talk) 07:32, 6 December 2019 (UTC)
- Oh, I see. No, not now. Hanooz 08:51, 6 December 2019 (UTC)
- @Zhuyifei1999:
- Magic News Latino (Commons:Deletion requests/Files found with insource:"UCmCnA Xl0Rz4rWoFRxeHO8Q")
- News In Town (Commons:Deletion requests/File:Aarathi (Kannada actress).jpg)
- thepaparazzigamer (Commons:Deletion requests/Files in Category:Media from Hollywood To You)
- River Play (Commons:Deletion requests/File:Dr. García.jpg)
- Tv2e (really requires human review)
- 3decades3kids (Commons:Deletion requests/File:Aimee Carrero 2016.jpg)
- Celeb Mas (Commons:Deletion requests/File:Salma Hayek 2018.png)
- Sandalwood Screen (Commons:Deletion requests/Files found with insource:"Sandalwood Screen")
- yoongified 93 (Commons:Deletion requests/File:Laura Marano 2017.png)
- Medios ACPO (really requires human review)
- TINI (Commons:Deletion requests/File:Tini Stoessel FUNDAMI.jpg)
- Soraya Alcala (Commons:Deletion requests/File:Marcus Ornellas.jpg)
- VER DE NOVO (Commons:Deletion requests/File:Lali Esposito 2017.jpg)
- BoomtronBSC (Commons:Deletion requests/File:Lucy Hale and Ashley Benson.png)
- Kelli Berglund Updates (Commons:Deletion requests/Files found with insource:"Kelli Berglund Updates")
- TEKASHI69 (Commons:Deletion requests/File:6ix9ine in 2018.png)
- NOTICIAS TOLUCA (Commons:Deletion requests/Files in Category:Belinda Peregrín Schull)
- FanWall Ru (Commons:Deletion requests/Files found with insource:"fanwall ru")
- Bryan Snap (Commons:Deletion requests/File:Megan Fox 2018.jpg)
- Winston Burris (Commons:Deletion requests/File:Taylor Swift Squad at Billboard Music Awards 2015.jpg)
- BANDAI NAMCO Entertainment Europe (Commons:Deletion requests/Files in Category:Videos by Bandai Namco)
- Some may have some original content, but all these really require a human review. - Alexis Jazz ping plz 17:07, 6 December 2019 (UTC)
- As you(HanoozZhuyifeiAlexis Jazz) 3 users are involved , Should I create a block list ? Please use "[Yy]es" or "[Nn]o". If I see more yes, I will create one. -- Eatcha (talk) 03:56, 7 December 2019 (UTC)
- I don't mind either way. --Zhuyifei1999 (talk) 03:59, 7 December 2019 (UTC)
- If it's not a huge undertaking, yes please. But it's "nice to have", not absolutely essential. - Alexis Jazz ping plz 04:05, 7 December 2019 (UTC)
- +1 Hanooz 05:31, 7 December 2019 (UTC)
- I don't mind either way. --Zhuyifei1999 (talk) 03:59, 7 December 2019 (UTC)
- As you(HanoozZhuyifeiAlexis Jazz) 3 users are involved , Should I create a block list ? Please use "[Yy]es" or "[Nn]o". If I see more yes, I will create one. -- Eatcha (talk) 03:56, 7 December 2019 (UTC)
Working -- Eatcha (talk) 05:59, 7 December 2019 (UTC)- Done HanoozAlexis Jazz Add as many IDs as you want @ User:YouTubeReviewBot/bad-authors. Small Demo @ https://repl.it/repls/RubberyNotableAnimatronics -- Eatcha (talk) 14:46, 7 December 2019 (UTC)
- BAD NEWS:Okay, before I start blocking YouTube channels, we got completely banned by YouTube. I cannot parse the webpage anymore! see (For powerful systems https://tools-static.wmflabs.org/ytrb/youtube23567.html)File:YouTube receiving a large volume of requests from a network.png, they ARE blocking accessing the site. I will now TRY to run this on my personal system, without getting sued by my ISPs for blocking access to the entire range. I will also try to run this on AWS, but worried about the bot password... -- Eatcha (talk) 04:40, 7 December 2019 (UTC)
- FYI: I was told that YouTube shall follow up on the v2c situation in 1-2 weeks. --Zhuyifei1999 (talk) 16:51, 7 December 2019 (UTC)
- Zhuyifei When were you told that? Who told you "what you are telling us" ? Are we still in the two weeks waiting period. --Eatcha (talk) 17:34, 7 December 2019 (UTC)
- When? This Wednesday. Who? Matanya. Are...? Yes --Zhuyifei1999 (talk) 17:44, 7 December 2019 (UTC)
- @Eatcha: the ban will probably expire at some point. I suspect the YouTube site wants you to enter a CAPTCHA, you should do so to lift the ban. After that, limit the bot to some very low value, like 5 pages a day, preferably with one connection. (keep alive) Have you tried scraping the page from archive.org? When available there, you don't need to make any YT request. When not available there, you could get the page by saving it to archive.org, but you should probably limit that to a very low value as well. (archive will pass on the origin ip when saving a page) - Alexis Jazz ping plz 18:09, 7 December 2019 (UTC)
- When? This Wednesday. Who? Matanya. Are...? Yes --Zhuyifei1999 (talk) 17:44, 7 December 2019 (UTC)
- Zhuyifei When were you told that? Who told you "what you are telling us" ? Are we still in the two weeks waiting period. --Eatcha (talk) 17:34, 7 December 2019 (UTC)
- FYI: I was told that YouTube shall follow up on the v2c situation in 1-2 weeks. --Zhuyifei1999 (talk) 16:51, 7 December 2019 (UTC)
- @Majora: can you update the AF for LR with {{YouTubeReview}}? - Alexis Jazz ping plz 18:14, 7 December 2019 (UTC)
- It may just be because I've had a long day but what exactly am I updating, Alexis Jazz? Making it so {{YouTubeReview}} can only be added by LRs right? If so I'll have to do that tomorrow if someone else doesn't get to it first. I'm quite tired and probably shouldn't be messing with abuse filter syntax right now. --Majora (talk) 04:59, 8 December 2019 (UTC)
- Yes. - Alexis Jazz ping plz 05:25, 8 December 2019 (UTC)
- @Alexis Jazz: Should be Done. --Majora (talk) 22:02, 8 December 2019 (UTC)
- Majora Could you also block this edit -- Eatcha (talk) 04:54, 10 December 2019 (UTC)
- Done {{VimeoReview}} has been restricted to image reviewers. --Majora (talk) 21:59, 10 December 2019 (UTC)
- Yes. - Alexis Jazz ping plz 05:25, 8 December 2019 (UTC)
- It may just be because I've had a long day but what exactly am I updating, Alexis Jazz? Making it so {{YouTubeReview}} can only be added by LRs right? If so I'll have to do that tomorrow if someone else doesn't get to it first. I'm quite tired and probably shouldn't be messing with abuse filter syntax right now. --Majora (talk) 04:59, 8 December 2019 (UTC)
- Zhuyifei1999 is there any way to fill the captcha ? Open a proxy ? Or ... And Alexis Jazz the bot is running on Amazon EC2, IP varies on each new instance. Check it's reviewing as we are talking. --Eatcha (talk) 18:27, 7 December 2019 (UTC)
- Well, if it's a CAPTCHA ("Completely Automated Public Turing test to tell Computers and Humans Apart"), it's supposed to tell computers and human apart so I would imagine it would be hard to fill it automatically... --Zhuyifei1999 (talk) 23:40, 7 December 2019 (UTC)
- Not automatically, just once to lift the block. (after that whatever triggered it must be adjusted to avoid a new ban) - Alexis Jazz ping plz 05:25, 8 December 2019 (UTC)
- Ah. @Eatcha: Is this behind the Cloud NAT (i.e. were you running on Toolforge Grid Engine?) --Zhuyifei1999 (talk) 06:46, 8 December 2019 (UTC)
- Zhuyifei Yes, the screenshot's via grid engine. But you should get the same error on sgeBastions - Eatcha (talk) 08:04, 8 December 2019 (UTC)
- Zhuyifei Is there any cap on storage per tool ? A 100GiB ? -- Eatcha (talk) 13:39, 8 December 2019‘’ (UTC)
- Do I need permission to store data at /data/scratch , I am keeping the webpages to avoid fetching a webpage more than one time. --Eatcha (talk) 17:32, 8 December 2019 (UTC)
- There are no hard storage quota per tool (aside from the disk space itself), but things like phab:T183920 can happened where you might get a ticket for using too much storage. And no, you don't need permission to use scratch, but please note that anything there could get deleted, and stuffs there is not to be trusted for long term storage. (It's even better if you would prune old files yourself) --Zhuyifei1999 (talk) 18:02, 8 December 2019 (UTC)
- Zhuyifei Yes, the screenshot's via grid engine. But you should get the same error on sgeBastions - Eatcha (talk) 08:04, 8 December 2019 (UTC)
- Ah. @Eatcha: Is this behind the Cloud NAT (i.e. were you running on Toolforge Grid Engine?) --Zhuyifei1999 (talk) 06:46, 8 December 2019 (UTC)
- Not automatically, just once to lift the block. (after that whatever triggered it must be adjusted to avoid a new ban) - Alexis Jazz ping plz 05:25, 8 December 2019 (UTC)
- Well, if it's a CAPTCHA ("Completely Automated Public Turing test to tell Computers and Humans Apart"), it's supposed to tell computers and human apart so I would imagine it would be hard to fill it automatically... --Zhuyifei1999 (talk) 23:40, 7 December 2019 (UTC)
- Looks like X11 forwarding is disabled on toolforge, so running GUI browsers on Toolforge that you can see and control is gonna be very complicated. Hmm... I'm still waiting for YouTube's response --Zhuyifei1999 (talk) 18:19, 8 December 2019 (UTC)
- «Zhuyifei» Ever tried joshdick-miniproxy ? -- Eatcha (talk) 18:58, 8 December 2019 (UTC)
- Ever read TOU? --Zhuyifei1999 (talk) 19:59, 8 December 2019 (UTC)
- TBH 🄽🄴🅅🄴🅁 I've already got 2 warnings for violation of TOU , you saved me from the third Warning or getting kicked out of the labs. -- Eatcha (talk) 02:17, 9 December 2019 (UTC)
- I forgot this one -- Eatcha (talk) 03:00, 9 December 2019 (UTC)
- Will do Vimeo review from today. See diff -- Eatcha (talk) 14:33, 9 December 2019 (UTC)
- I forgot this one -- Eatcha (talk) 03:00, 9 December 2019 (UTC)
- TBH 🄽🄴🅅🄴🅁 I've already got 2 warnings for violation of TOU , you saved me from the third Warning or getting kicked out of the labs. -- Eatcha (talk) 02:17, 9 December 2019 (UTC)
- Ever read TOU? --Zhuyifei1999 (talk) 19:59, 8 December 2019 (UTC)
- «Zhuyifei» Ever tried joshdick-miniproxy ? -- Eatcha (talk) 18:58, 8 December 2019 (UTC)
- Looks like X11 forwarding is disabled on toolforge, so running GUI browsers on Toolforge that you can see and control is gonna be very complicated. Hmm... I'm still waiting for YouTube's response --Zhuyifei1999 (talk) 18:19, 8 December 2019 (UTC)
- Alexis Jazz I'm being asked to fill the captcha multiple times, already filled 8 times. -- Eatcha (talk) 05:46, 10 December 2019 (UTC)
- (above comment moved for clarity)
- @Eatcha: this could happen for various reasons. You may not be giving the answer Google wants. (captcha difficulty varies depending on some things, the more difficult captcha are nearly impossible for humans) You may not be saving the cookie for the captcha properly. Your browser may not be processing the captcha properly. Or something else. - Alexis Jazz ping plz 06:56, 10 December 2019 (UTC)
- Info I am using Internet Archive as a proxy, submit the webpage for archiving in real-time then read the archived page to review files and add the archived link in the template. -- Eatcha (talk) 13:03, 12 December 2019 (UTC)
- You are probably just gonna get IA blacklisted as well. --Zhuyifei1999 (talk) 15:24, 12 December 2019 (UTC)
- Zhuyifei , IABOT runs on toolforge ? It was funded by IA, they won't harm their project by blocking toolforge IP. -- Eatcha (talk) 16:07, 12 December 2019 (UTC)
- I meant IA blacklisted... by YouTube --Zhuyifei1999 (talk) 16:10, 12 December 2019 (UTC)
- @Zhuyifei1999: I think YouTube has a higher threshold for IA. (though my experience here is limited) But @Eatcha: if you retrieve the page from IA without archiving it (there is no reason to force a new capture if it's already archived), that should be fine. Don't overload YouTube. You could ignore unarchived urls and wait for IABOT before trying again. Use https://archive.org/wayback/available?url=https://www.youtube.com/watch?v=_FFXjThEW8o. - Alexis Jazz ping plz 16:57, 12 December 2019 (UTC)
- Zhuyifei1999 https://web.archive.org/web/20190701000000*/https://whatismyipaddress.com click on random 5 days and you tell me, can youtube block IA ? The IPs are variable. They can't just block one or 2 ISPs. And thanks for the information about the mailing list. -- Eatcha (talk) 17:10, 12 December 2019 (UTC)
- IA isn't a botnet, so yes it can. Range blocks are a thing. UA blocks are also a thing. It may not be blocking ISPs, but it can certainly ask the users under that ISP to captcha if they suspect weirdness (eg. the set of cookies that is sent to the server is a null set). And yes, Alexis Jazz's advice is nice and much safer. --Zhuyifei1999 (talk) 19:52, 12 December 2019 (UTC)
- @Eatcha: maybe not, but IIRC archive will pass on the origin IP which YouTube may or may not ignore.. Just stick to archived versions where available, there's no risk in that. - Alexis Jazz ping plz 19:45, 12 December 2019 (UTC)
- Zhuyifei1999 https://web.archive.org/web/20190701000000*/https://whatismyipaddress.com click on random 5 days and you tell me, can youtube block IA ? The IPs are variable. They can't just block one or 2 ISPs. And thanks for the information about the mailing list. -- Eatcha (talk) 17:10, 12 December 2019 (UTC)
- @Zhuyifei1999: I think YouTube has a higher threshold for IA. (though my experience here is limited) But @Eatcha: if you retrieve the page from IA without archiving it (there is no reason to force a new capture if it's already archived), that should be fine. Don't overload YouTube. You could ignore unarchived urls and wait for IABOT before trying again. Use https://archive.org/wayback/available?url=https://www.youtube.com/watch?v=_FFXjThEW8o. - Alexis Jazz ping plz 16:57, 12 December 2019 (UTC)
- I meant IA blacklisted... by YouTube --Zhuyifei1999 (talk) 16:10, 12 December 2019 (UTC)
- Zhuyifei , IABOT runs on toolforge ? It was funded by IA, they won't harm their project by blocking toolforge IP. -- Eatcha (talk) 16:07, 12 December 2019 (UTC)
- You are probably just gonna get IA blacklisted as well. --Zhuyifei1999 (talk) 15:24, 12 December 2019 (UTC)
- I will wait for YouTube's (18-December-2019) response before pulling something new on tool-forge it's not productive to change the source every next week, and the bot is NOT generating huge requests to YouTube, about 1 edit per hour is not enough to get banned by YouTube. Edit rate will automatically increase if we somehow get let's say 100,000 files at once. It's proportional to number of files waiting for archiving. -- Eatcha (talk) 06:01, 13 December 2019 (UTC)
- Alexis Jazz Done Will search the oldest archive on the way-back machine to detect license if fails then will search the real-time archive video. Example: File:PIZZA FIDGET SPINNER.webm see diff(the archive timestamp). The video is no longer CC on YouTube, searching the oldest archive helps.
Flow :
- Oldest-archive ---> Real-time video --> all archives --> Get Result (PASS or FAIL)
Thanks -- Eatcha (talk) 10:47, 15 December 2019 (UTC)
@EugeneZelenko: I will be very busy till January 10, if everything looks okay you may flag the bot. I will edit after January 10, If not okay, please keep the request open till then. Thanks -- Eatcha (talk) 07:24, 16 December 2019 (UTC)
Files should be sorted as follows:
- Category:License review needed (video)
- Files passed by YouTubeReviewBot pending human reviews
as well as
- Category:License reviewed by YouTubeReviewBot
- Files passed by YouTubeReviewBot pending human reviews
- Files passed by YouTubeReviewBot but human review is impossible (for those sources that go missing and have no archived records)
Files first passed by YouTubeReviewBot and subsequently passed by a human should be categorised under Category:Files from external sources with reviewed licenses.--Roy17 (talk) 01:40, 25 December 2019 (UTC)
- Files passed by YouTubeReviewBot pending human reviews will be another backlog :
therefore I Oppose, all links are archived. There is no need for another backlog. We never had it for other websites, why Youtube ? YouTube is far more efficient in removing copyrighted materials.-- Eatcha (talk) 05:42, 29 December 2019 (UTC) - Files passed by YouTubeReviewBot but human review is impossible (for those sources that go missing and have no archived records) : Impossible human review implies Impossible bot-review
- Files first passed by YouTubeReviewBot and subsequently passed by a human should be categorised under Category:Files from external sources with reviewed licenses : Support, this should be done using the template, appending
|reviewer={{subst:REVISIONUSER}}
should add the required category. I don't know how to edit the template for this change. -- Eatcha (talk) 15:23, 28 December 2019 (UTC)- Your bot's job is not to pass LR but merely verifying whether the link given has a commons-compatible licence. All files have to be passed by humans. The only other bot reviewed site is flickr. It does not require humans because the bot can check whether upload is identical to source. In case it's not it gets sorted in flickr needing human review. You know your bot is not designed for this.
- Files verified by your bot but disappeared/made private before human review would be problematic. If the source is unreliable it's most probably gonna be deleted. But let's say if the video appears alright and the source is reliable like being a public institution, it's up to the community whether they should be hosted. Maybe the community would reject all such, maybe not. Before that concensus is formed, it's better to preempt the situation.--Roy17 (talk) 02:35, 29 December 2019 (UTC)
- Files passed by YouTubeReviewBot pending human reviews : will be another backlog I would quote this discussion in the future if backlog keeps increasing. I am now Neutral.
I am not able verify the video because Public IP of tool forge is banned by YouTube, I can not download the videos to compare. Comparing Images is easier than videos, videos are in fact re-encoded and much bigger files. -- Eatcha (talk) 05:42, 29 December 2019 (UTC)
- Files verified by your bot but disappeared/made private before human review would be problematic. : Not possible due to limitations I will not fetch the live YouTube page, we are blocked. See the discussion page of Video2Commons. Alexis Jazz and Zhuyifei1999, asked me not to force a new archive every time. According to both of them "YouTube will block IA if I do that" When the bot pass a file, I am retrieving the page via wayback-machine due to this block by YouTube. -- Eatcha (talk) 05:42, 29 December 2019 (UTC)
- Again you dont seem to understand limitations of your own bot. Archived pages do not archive the video. Your bot only checks whether a link has a good licence, but chances are the upload does not match the link or it contains extra material like another soundtrack remixed.
- In case you still dont inderstand. Your bot passed File:Mama Cax at Chromat AW19 Climatic.webm. I cut this footage from the source. Now if the source disappears before a human comes to it, all that's left is the archived page, which says nothing about whether the upload was indeed part or the whole of the video.--Roy17 (talk) 14:24, 29 December 2019 (UTC)
- I asked for thoughts on it @Commons:Village_pump#Need_some_opinions_for_YouTubeReviewBot, un-free music from YouTube is non - issue, it's beating humans at the moment. Unfortunately for vimeo, it's not true. I can Increase accuracy by checking the video-length, but it would fail cropped/trimmed videos, which is undesirable IMHO. -- Eatcha (talk) 06:05, 30 December 2019 (UTC)
Does the bot review the captures (i.e. screenshots) of YouTube videos? – Kwj2772 (talk) 07:06, 31 December 2019 (UTC)
- Kwj No, that would be risky. We can deduce From archive and the length of video without even watching the full video if they are the same. But it's not possible to tell if a screenshot is from a particular video. Straight answer : no it doesn't review screenshots from a video. -- Eatcha (talk) 07:47, 31 December 2019 (UTC)
tl;dr: What is the current state of discussion, is this ready to be approved, or which issue are open? --Krd 15:51, 31 December 2019 (UTC)
- Krd No issue with the bot, but Roy17 has asked for creation of a category ("Category:Files passed by YouTubeReviewBot pending human reviews"), the purpose of category : All files reviewed by this Bot should be reviewed by humans again, because the bot checks the license (only), and has only a channel black list as a preventative measure. The bot doesn't checksum videos, because it's impossible for bigger files like videos. YouTube stores video and audio separately after transcoding checksum doesn't matches. And toolforge's public IP is banned as Google believes we violated their TOS by mass downloading thousands of videos. In my opinion we don't need that category because "That would be definitely an another backlog, the bot reviewed some files which weren't reviewed for More than 3 years. Some links were dead, they are reviewed using wayback machine. way back machine doesn't saves the video, just images and html/js/style . In Roy17' opinion why should we believe that the video uploaded on Commons is the same video which was once available under that link, which is now dead and we are left with an archive on wayback machine which doesn't archives the video "just the webpage(text+images)". We can compare the video length IMHO, and if imported using video 2 commons directly from url then it's a non-issue as we don't get to choose the upload summary. I am neutral on this, but maybe input from others is necessary before creating a category that is destined to be a backlog, as per today's review rate. BTW: I am not against 2nd human review, if we have enough interested humans. {{YouTubeReview}} States that
If you are a license reviewer, you can review this file by manually appending |reviewer={{subst:REVISIONUSER}} to this template.
. -- Eatcha (talk) 17:49, 31 December 2019 (UTC)
- In a nutshell, the bot's job is confirming <URL> has a <licence> as of <date>. Human review is necessary to check (1) the video/audio does come from the <URL> (2) the <URL> is published by a genuine account instead of a fake one. Technically this problem might not concern the current workflow of the bot, but as I believe this is an utmost consideration which has not been raised, how this goal is achieved is best adapted with the bot's design. There's also suggestion that YouTubeReview should be merged with LicenseReview, which I definitely support.
- btw, youtube doesnt always strike down videos using unauthorised music but often only label them as containing it in the attributions field. It doesnt recognise all music either but only the versions supplied by music companies, not to mention music not distributed by mainstream labels. Fake channels pirating news videos are also rampant, e.g. Commons:Deletion requests/Files in Category:2019 Koreas–United States DMZ summit.--Roy17 (talk) 01:09, 1 January 2020 (UTC)
Approved. --Krd 06:33, 25 January 2020 (UTC)