User:Jonkerz/AntWeb
Roadmap
[edit]- Phase I: cleanup current photos
- Update files in Category:Images from AntWeb
- Update old descriptions
- Move reclassified taxa
- Delete invalid taxa
- This also includes genus/species categories in Category:Formicidae
- Update files in Category:Images from AntWeb
- Phase II: upload new photos
- Coming soon
Todo
[edit]- Generate a list of all filenames in Category:Images from AntWeb Done
- Replicate the description param ("Dorsal view of ant Tatuidris tatusia specimen casent0178755.") Doing…
- GitHub
- Commons:Guide_to_batch_uploading#Check_for_duplicates
- Commons:Bots/Requests
Random stuff
[edit]To consider
[edit]- Rename all files to conform to the "AntWeb/Commons format"?
- Only relevant for a couple of specimens
- Example "File:Adelomyrmex robustus lacm ent 144606 dorsal 1.jpg":
- Taxon = "Adelomyrmex robustus", "Adelomyrmex robustus lacm", or "Adelomyrmex robustus lacm ent"?
- Alternative solution: add "lacm ent <number>" et al. to a list of exceptions
Filename regex
[edit]"Valid" is defined as passing the below regex. In English: "File:Genus species subspecies (optional) specimen_id shot number.jpg", where the specimen id may contain alphanumeric characters, parentheses and hyphens. "shot" must be one of the following: "head", "dorsal", "profile" or "label".
%r{ ^File: (?<genus>[A-Z][a-z]+) \s (?<species>[a-z]+) \s (?<subspecies>[a-z]+\s)? (?<specimen_id>[()\w-]+ ) \s (?<shot>head|dorsal|profile|label) \s (?<number>[0-9]) .jpg$ }x
List
[edit]- Category:Images from AntWeb (32,178 total)
- /valid (31,837 total; first 100 kB of 1,2 MB)
- /valid and subspecies (1,232 total)
- /invalid (341 total)
- /valid (31,837 total; first 100 kB of 1,2 MB)
"Valid" only means that the filename match the rule -- many of these are not valid in the taxonomic sense; some have been reclassified and some were uploaded to AntWeb more than once under different names (example of the latter include File:Monomorium aureorugosum casent0172919 head 1.jpg/File:Strumigenys glycon blf0976(41)-19 head 1.jpg which were both uploaded in 2009).
All "valid and subspecies" are already included among the valid, but repeated to make sure specimen ids congaing spaces are not parsed as subspecies names.
Cause of invalid bot-uploaded files:
- File:Temnothorax interruptus casent0173186 1.jpg - shot missing
- File:Tapinoma luridum longiceps lacm ent 181857 head 1.jpg - specimen id contains spaces