Commons:Bots/Requests/Noaabot
Operator: Fæ (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)
Bot's tasks for which permission is being sought: Uploading of archives of images from the National Oceanic and Atmospheric Administration.
These are public domain and the background of the initial request and project can be found at Commons:Batch_uploading/Weather_maps#Coordination. In addition to the initial batch upload of around 20,000 images providing maps from September 2002 to the current day, there may be categorization and formatting changes as needed that can run from this account. Beta test images consisting of weather maps for the first year of the archive and the most recent month of maps, can be found at 2002 NCEP weather maps (610 maps) and 2013 NCEP weather maps (ongoing with the most recent weather maps being uploaded each day; these appear to be released after 2pm EST for the previous day's maps).
Partial supervision for archive. Beta testing then monitored runs would be expected, with unmonitored runs once uploads or changes are seen to be stable (i.e. 1,000 or more uploads or changes). Daily or weekly updates would be automatic with some regular oversight or in response to questions.
Automatic or manually assisted: Automatic.
Edit type (e.g. Continuous, daily, one time run): One time runs for the past 11 years of archive maps and then a daily or weekly automatic update. For the NCEP weather, 5 types of maps are made available each day and a weekly summary pdf is derived from these. The maps are published as gifs and Noaabot is converting these to pngs before uploading.
Maximum edit rate (e.g. edits per minute): Approximately 4 per minute.
Bot flag requested: (Y/N): Y
Programming language(s): Python
Fæ (talk) 11:12, 22 May 2013 (UTC)
Discussion
- Not sure if the template is correct—the images do not come directly from NOAA, but from NCEP (National Centers for Environmental Prediction) — other than that, I'm quite OK with the way files are uploaded. odder (talk) 12:21, 22 May 2013 (UTC)
- Categories now changed from using "NOAA" to "NCEP". In theory the NCEP is a child of the "National Weather Service" (previously "Weather Bureau") which itself is a child of "NOAA". I would suggest avoiding making the category tree over-hierarchical until it starts to appear over-loaded or might be misleading. --Fæ (talk) 06:11, 24 May 2013 (UTC)
- Looks OK for me. --EugeneZelenko (talk) 14:35, 22 May 2013 (UTC)
- Files should have more categories. A category for the day and a category for the type. Also, are you going to upload the Daily Weather Map Weekly PDF Files? The pdfs contain vectorized maps.Smallman12q (talk) 00:58, 24 May 2013 (UTC)
- The categories can be added quickly enough. The types are selected in the ingestion template and I will tweak it to include a NOAA weather map type category automatically. I am less certain about the value of a day category, my original thoughts on this were that the files are strictly named, with ISO format date at the beginning, so any particular month or day's worth of images (there are 5 types for each day) can be easily found by scrolling through the alphanumerically sorted category of the year, and a breakdown further than year would be superfluous. I'm happy to include day categories if the benefits of doing so are a bit clearer to me. In the longer term I would like a navigation template (possibly automated as part of the ingestion template) to provide a link to the other map types for the day, yesterday and tomorrow's image. I am deferring this until either I get to grips with Lua, or another volunteer can take a look at it; certainly it is a more long term improvement that can be worked out once the archive backlog is uploaded (one of the benefits of using an ingestion template is that tweaks like this can happen on the template rather than across 20,000 files ).
- Yes, the weekly pdfs can be uploaded from the pdf page—hopefully the NCEP will stick to the recently changed naming scheme so updates can be part of the batch upload. There are not that many, as they are only available from the last week of November 2012. It would be great if these were decomposed at some point, so that the embedded vector maps are available in addition to pngs, so having the pdfs on Commons is a good start.
- Done Category:NCEP weekly weather maps and Category:NCEP weekly color weather maps. --Fæ (talk) 12:42, 24 May 2013 (UTC)
- Question-PDF versions are available at ftp://ftp.wpc.ncep.noaa.gov/dwm/ for the first week of 2003 through 2009. More historical pdfs are available at http://www.lib.noaa.gov/collections/imgdocmaps/daily_weather_maps.html .Smallman12q (talk) 17:13, 21 June 2013 (UTC)
- No problem, I'll sort these out. The weekly pdf uploading is done as a one-off manual job, rather than in a daily loop. On the surface the archives look a bit ad-hoc, but I'll investigate in a week or more and decide whether the historical archives should all be done as a custom one-off or if there is a pattern here to build in. --Fæ (talk) 19:40, 21 June 2013 (UTC)
- Just as an outside thought, I believe that a category for each day is just too much, but I would support creating monthly categories; there should be around 150 files in each category, which would make them quite useful. odder (talk) 12:44, 24 May 2013 (UTC)
- Looking at the yearly categories for single map types is useful for seeing weather patterns over the seasons, so we might actually put them both in annual and monthly categories. I think this could be done again by adapting the ingestion template rather than sticking these on every file. I'll see if I can come up with a template inclusion solution. --Fæ (talk) 12:49, 24 May 2013 (UTC)
- Categorisation by template is a Bad Idea™ (as I learnt not very long ago). I'd rather just add those categories to directly to file descriptions pages, so to let people use existing tools (CatScan, Cat-a-lot) when necessary. odder (talk) 12:55, 24 May 2013 (UTC)
- Okay, easy enough once a scheme is put up. Could you think about how the hierarchy would best work and make a suggestion here? I can then ensure future uploads fall in line, and we can easily fix the 'beta test' uploads so far. Thanks --Fæ (talk) 12:58, 24 May 2013 (UTC)
- File:2013-05-06 Color Max-min Temperature Map NOAA.png shows how — in my opinion — those images should be categorised: by type (Category:NCEP daily surface weather maps), by year (Category:2013 NCEP weather maps), by month (Category:NCEP weather maps for May 2013), and by year and type (Category:2013 NCEP daily surface weather maps). This was it should be easy to do category intersection and perform other actions necessary for good maintenance of categories and files. I also set up a whole new category tree at Category:NOAA maps. odder (talk) 20:23, 24 May 2013 (UTC)
- It all looks useful and commonsense, I'm happy to implement all these automatically in the batch upload once the bot gets flagged. --Fæ (talk) 21:01, 24 May 2013 (UTC)
- File:2013-05-06 Color Max-min Temperature Map NOAA.png shows how — in my opinion — those images should be categorised: by type (Category:NCEP daily surface weather maps), by year (Category:2013 NCEP weather maps), by month (Category:NCEP weather maps for May 2013), and by year and type (Category:2013 NCEP daily surface weather maps). This was it should be easy to do category intersection and perform other actions necessary for good maintenance of categories and files. I also set up a whole new category tree at Category:NOAA maps. odder (talk) 20:23, 24 May 2013 (UTC)
- Okay, easy enough once a scheme is put up. Could you think about how the hierarchy would best work and make a suggestion here? I can then ensure future uploads fall in line, and we can easily fix the 'beta test' uploads so far. Thanks --Fæ (talk) 12:58, 24 May 2013 (UTC)
- Categorisation by template is a Bad Idea™ (as I learnt not very long ago). I'd rather just add those categories to directly to file descriptions pages, so to let people use existing tools (CatScan, Cat-a-lot) when necessary. odder (talk) 12:55, 24 May 2013 (UTC)
- Looking at the yearly categories for single map types is useful for seeing weather patterns over the seasons, so we might actually put them both in annual and monthly categories. I think this could be done again by adapting the ingestion template rather than sticking these on every file. I'll see if I can come up with a template inclusion solution. --Fæ (talk) 12:49, 24 May 2013 (UTC)
- Why did bot upload images from USA military? --EugeneZelenko (talk) 14:17, 27 May 2013 (UTC)
- Classic human error. I was testing out some complicated checks for non-identical duplicates for Department of Defense batch uploads and ran the upload from the wrong terminal. I killed the process as soon as I noticed my error 6 minutes later, by which time a total of 6 public domain photographs had been uploaded out of a planned batch of more than 500. --Fæ (talk) 17:38, 27 May 2013 (UTC)
- I'd much prefer not to have them in both year and month, per COM:OVERCAT. Scrolling through is not much of an argument, because there are 5 different types in the year category, so you can hardly remember what it was 5 files back. Also I think a daily category would actually be useful for weather, because some people like comparing today's weather with last year's on the same day. --99of9 (talk) 18:48, 27 May 2013 (UTC)
- The current directories provide a directory per year by type (example), so you do not have to browse the full year with all 5 map types. I am unsure what your expectation of a day category means. Is this something like "day 42" of all years (leap years being a problem), or "Monday", or something else? Any of this is do-able, but I feel a lot could be done with category intersections rather than hard categories, or by a user searching by dates. --Fæ (talk) 18:54, 28 May 2013 (UTC)
- Yes, 2002_precipitation is a good category for browsing. But the files in there, are also in the parent cat Category:2002_NCEP_weather_maps, which if this convention is followed, will eventually have 5*365 files in it. I think they should be removed from that, since they can be immediately and directly placed in any relevant subcats at the time of upload. --99of9 (talk) 11:21, 30 May 2013 (UTC)
- Sorry I didn't properly explain the daily category idea. What I mean is similar, but more user-friendly than "day 42". My category titles would be something like Category:NCEP weather maps for 11 September (although I can understand the argument for having the date the other way around for USA weather, current Commons date categorization is this way: Category:Days_in_September). I can think of a few things people might use these categories for (e.g. historical event research; or "wasn't it warmer last year?"). I don't think many users are sophisticated enough to do cat-intersects, and since it's easy to do, I'm not sure why we can't do it for them now? (On the other hand, I don't think "Monday" is correlated with weather patterns, so I don't think it's useful.) --99of9 (talk) 11:21, 30 May 2013 (UTC)
- Okay, let me ponder it. The most recent uploads for days this week include the month category and it is easy enough in Python to name a category by the day of the month. I'll think about setting up an example day, and then uploading the maps for the full 11 years for that one day of the year, so we can see it in "action" before making a mare's nest of categories. Obviously "29 February" will end up a little sparse. --Fæ (talk) 11:27, 30 May 2013 (UTC)
- Yes, 29 Feb will get a raw deal, as it usually does. My scheme only adds 366 subcategories, which is nothing if you've got ~20k files. --99of9 (talk) 11:36, 30 May 2013 (UTC)
- I have set up Category:NCEP weather maps for 20 May as a little test. Currently for years 2006-2013. This was manual, but it can be easily automated for future uploads. --Fæ (talk) 12:23, 30 May 2013 (UTC)
- That looks fine to me. Personally, if I were the uploader, I would further subcat by type, but it seems that you and odder prefer larger cats (or dislike complex subcategory trees more) than me. --99of9 (talk) 13:38, 30 May 2013 (UTC)
- I have set up Category:NCEP weather maps for 20 May as a little test. Currently for years 2006-2013. This was manual, but it can be easily automated for future uploads. --Fæ (talk) 12:23, 30 May 2013 (UTC)
- Yes, 29 Feb will get a raw deal, as it usually does. My scheme only adds 366 subcategories, which is nothing if you've got ~20k files. --99of9 (talk) 11:36, 30 May 2013 (UTC)
- Okay, let me ponder it. The most recent uploads for days this week include the month category and it is easy enough in Python to name a category by the day of the month. I'll think about setting up an example day, and then uploading the maps for the full 11 years for that one day of the year, so we can see it in "action" before making a mare's nest of categories. Obviously "29 February" will end up a little sparse. --Fæ (talk) 11:27, 30 May 2013 (UTC)
- The current directories provide a directory per year by type (example), so you do not have to browse the full year with all 5 map types. I am unsure what your expectation of a day category means. Is this something like "day 42" of all years (leap years being a problem), or "Monday", or something else? Any of this is do-able, but I feel a lot could be done with category intersections rather than hard categories, or by a user searching by dates. --Fæ (talk) 18:54, 28 May 2013 (UTC)
- I have gone along with your suggestions and implemented them in the upload, namely dropping the year category with all types and breaking down the day category with types. This will mean a bit of category emptying later on, and quite a bit of category creation (which I have not automated yet, but will ponder it). In the example of the file I just uploaded, File:2006-05-20 Max-min Temperature Map NOAA.png this means the following categories were added: --Fæ (talk) 15:40, 30 May 2013 (UTC)
I think the first can be cut out because it's a parent of the second, and I doubt anyone will want to do a slideshow of over 12 years of files. Depending if you think people would prefer to scan a whole year or a month at a time, you could cut it down to just (with even more category creation):
- Category:NCEP black and white daily max-min temperature maps for May 2006
- Category:NCEP black and white daily max-min temperature maps for 20 May
If you need help with the category creation I have some scripts that might help. --99of9 (talk) 15:51, 30 May 2013 (UTC)
- A Python code snippet (by email) might be helpful. Were I writing it, as I have the generated category name, I just need to call something to check "does this exist?" and if not, then I'll write the initial contents (which I have the basics already in the code to write in). For the existence check, rather than a failed page connection, a commons API call might be a quicker way of doing it. I'm not really stuck on this, it's just time to look it up and test it out. The '12 year' type cat is easily switched off in the template. I disagree with dropping the year of a type cat, this is rather useful for seeing the seasonal patterns over the year which would be much harder to do if broken into 12 month categories; though I'm not against having both.
- Consider it pondered. It is easy enough to do a call like this and check for the 'missing' flag. I'm assuming this is slightly quicker than getting the category page, which I can do if this existence test fails. I'll add this in before running a bit more testing.
- Category existence/creation routines now added, I am running through files for Category:NCEP weather maps for 21 May to check. Creating categories this way means they only get created when there is a file to populate them. --Fæ (talk) 10:06, 31 May 2013 (UTC)
Approved --99of9 (talk) 00:41, 4 September 2013 (UTC)