User:Fæ/Project list/NYPL
Introduction
[edit]In December 2015, the NYPL announced an improvement to how images were licensed for public reuse, and made 180,000 available with public domain licenses. This upload project mirrors these images on Commons and creates image text pages based on the metadata available from the NYPL MODS records.
Though the NYPL release is for 180,000 artifacts, the number of images uploaded to commons may be larger as some documents have multiple pages, others have been uploaded in the past at lower resolutions under different names, and where full size jpeg versions are available as well as the TIFF originals, both are to be uploaded. As of October 2016, the total number of NYPL images as part of this project was 222,000 images uploaded with total file storage needed of more than 8,000 gigabytes (8 terabytes); see quarry report.
- Links
- A summary of the collections are at http://www.nypl.org/research/collections/digital-collections/public-domain.
- A visualization page can be found at http://publicdomain.nypl.org/pd-visualization.
- The NYPL runs an open API which is described at http://api.repo.nypl.org. Accessing it needs registration.
Collections batch uploaded
[edit]The numbers below represent the latest figures for how many files are in the category, in some cases files have been diffused since upload into sub-categories which are not counted.
-
17,178 R NYPL American popular songs (the majority of files in this collection get rejected by the WMF API as having bad tiff info, it is unclear why, so this upload has been restricted to full size NYPL jpegs. See Phab:T124662.)
-
11,461 R Emmet Collection of Manuscripts
-
41 R Detroit Publishing Co. (136 jpeg files previously existed in this category and 1,297 jpegs from the Library of Congress at Category:Photochrome pictures from Detroit Photographic Company, refer to a previous proposal at Commons:Batch_uploading#Detroit_Publishing_Company_at_LoC)
-
2,903 R R. H. Burnside Collection
-
78 R Description de l'Egypte ~200 versions pre-existing
- Others
- 27,268 R Buttolph collection of menus
- 18 R Vinkhuijzen collection (total reduced as files are moved into subcategories)
- 9,011 R Theodorus Bailey Meyers Collection
- 62 R Samuel J. Tilden Collection
- 930 R Picturesque Palestine, Sinai, and Egypt (NYPL scan)
- 0 R Pacific pursuits postcards
- 471 R NYPL collection of Atlases, gazetteers, guidebooks and other books
- 437 R Birds of America, from drawings made in the United States and their territories
- 665 R Traité des arbres et arbustes que l'on cultive en France en pleine terre
- 988 R A collection of the dresses of different nations, antient and modern
Other collections are not listed here and are mostly under 500 images in size.
Initial uploads relied on {{information}} and later uploads apply the ingestion template {{nypl}}.
The uploads are picked from http://publicdomain.nypl.org/pd-visualization/
Numbering
[edit]Filename scheme is:
<title> (NYPL <collectionid>-<imageid>).tiff
Where the title is as provided by the NYPL but truncated to sub-215 characters. The collectionid uses NYPL b-number or OCLC reference if not available, or the Hades legacy number if neither of those are in the metadata. Jpeg versions are identical apart from the extension.
The standard credit template of {{NYPL-image-DigitalID}} is used, which adds a parent bucket category automatically.
Automated checks and validation
[edit]As of June 2016, Fæ's processes for uploading new images involves several robust customized checks using pywikibot-core, and additional modules available in Python. The order of checks is based on the lag they may create in processor time and the amount of data to be transferred over the internet. Any image that fails one of these checks is skipped on the first failure, unless another action is specified.
- There is a text search for the intended Commons filename to ensure it has not been created previously.
- The metadata for the image page is extracted from the NYPL MODS data. This in turn provides links to any pre-existing high resolution jpeg versions of the TIFFs. The JSON structure of the MODS data from NYPL is inconsistent, so a heuristic approach has been taken which may not be ideal for capturing all useful metadata as unexpected data structures causing errors when parsing are normally skipped and the image text page created without those elements.
- Topics included in the MODS metadata are checked for matching Commons category names. See #Categorization
- Commons is searched for text matches to the NYPL ID numbers. If there are any matches then the file has most likely been uploaded before, though possibly with a different set of EXIF data.
- Past deletions for the suggested file name are searched for. This avoids re-uploads of files that were deleted for being 'mostly blank' or similar; there is a good faith presumption that any past deletion is valid.
- Commons is searched for matches to the local SHA1 value. Unfortunately the NYPL metadata does not provide SHA1 values which could be used in advance of downloading a file.
- Immediately after upload the image text page is checked to see it is greater than 300 characters, if not then the text page is regenerated. This is an outstanding rare bug for uploads and may be related to WMF server drop-outs or more general internet connection problems. Refer to Phab:T113878.
Image page galleries
[edit]Post-upload a routine is manually run on collections where galleries of associated files will add value. The "association" is worked out automatically by searching Commons for the artifact UUID. For example File:Stone, Thomas (NYPL b11868620-5338054).jpg relates to a gallery on the NYPL source site with four images. As both the TIFF and jpeg files have been uploaded to Commons, with the jpeg versions providing useful cropped versions of the scans, there are eight files related to each other, so the example image page cross-links by using a gallery of seven thumbnails. The gallery makes it much easier to navigate between pages of a document while still making it easy for Wikipedia contributors to select a single page or plate to illustrate an article.
Blank pages
[edit]Blank page for an 1881 4-page menu of The Brunswick Hotel, in the Buttolph collection. Due to damage, unlikely to be auto-detected as blank. See search. | |
Page identified as not blank, as none of the central 4 zones is blank. The image would have been detected as 'mostly blank' without the bulls-eye test. |
Some blank pages are a 'feature' of parts of the archives. For example the scans of music scores are likely to include a blank page after the cover page, and the scans of stereograms include the backs and though some have interesting descriptions or indexes, many others are blank.
It is possible to attempt to auto-detect these and mark them for review, the same programme used for User:Fæ/Project_list/Internet_Archive#Blank_pages has been adapted to check the NYPL images. This can be run as a semi-automated housekeeping task on demand, depending on how significant an issue it seems for the community.
In addition to the "mostly blank" detection routine used for Internet Archive images, due to a number of very small portrait images in the centre of book plates, the middle four squares of the chessboard grid are checked to ensure that at least two of the four are blank in order for the image to be considered mostly blank.
Jpegs
[edit]As well as the master TIFF, a jpeg version will be loaded if a full size derivative is identified in the metadata. For many collections the only jpegs available are sub-800px wide and not worth loading.
The advantage of having a jpeg is that they make better (sharper) thumbnails at the current time, and are easier to view within Commons, while the TIFF needs to be downloaded to view it losslessly (this may improve in the future). Some of the NYPL collections offer more value in the jpeg versions as they have been cropped and rotated by the curators.
Where crops are needed, such as trimming off redundant scanned background, blank page margins or colour swatches, the jpeg can be cropped using COM:CropTool. Best practice would be to leave the TIFF version untouched as the archive reference image, however cropped derivatives of the TIFF can be uploaded if there is no original high resolution jpeg from NYPL.
After an upgrade to Pywikibot-core in June 2016, made necessary by the WMF rejecting http API queries, it was found that the upload module did not seem to behave well in response to exceptions. This may result in jpeg versions being delayed for an indefinite time. Refer to #T138206.
Galleries
[edit]Many of the documents in the NYPL collections have multiple pages. The UUID link in the image description provides an easy way of finding all "live" pages and variations of the same document. In addition there is a post-upload housekeeping process to create gallery tables placed at the Other versions parameter in the information box for each image, see Help:Gallery tag. This cross-links each page with thumbnails for all the other pages with the same UUID, giving an easy visual way to navigate around a document. Where both TIFF and jpeg versions of the same page exist, these will be shown as duplicate images in the gallery, though as the filename is shown when the user floats their mouse over the thumbnail, the user has a choice over which type of file to jump to.
When images are deleted, at the current time this is only likely to happen for mostly blank pages with little educational value, the gallery will show a blanked space for the link. The housekeeping routine can be re-run to update all galleries including removing deleted files, if there are sufficient numbers for it to be worthwhile.
Categorization
[edit]The NYPL metadata includes subject / topics which are checked to see if there are matching Commons categories, and added if not marked as {{CatDiffuse}} or {{Categorise}} categories. The full list of topics is added to the image page text information.
As the NYPL metadata has an inconsistent JSON format, the use of topics cannot be relied on for any particular image or collection.
As of June 2016, the family tree of categories for the NYPL collections is not ideal. Images may appear in all of a specific collection category, a NYPL department category and the template based bucket category. It could be that the auto-generated department categories might be mass removed where specific collection categories exist, on the basis that a collection is always within a department.
From July 2016, the batch upload process started to recognize where categories use {{category redirect}}, and skip any category matching that condition. This avoids the situation where the images are moved automatically to a category already marked with diffusion templates and may cause a "flooding" problem.
Examples:
- Paraqvaria Vulgo Paragvay Cum adjacentibus (NYPL b14467885-1505059).jpg has a single NYPL topic of "history". The matching category was checked but uses the CatDiffuse template and so is ignored as a suggestion. The image page still lists history in the Topics field.
- Army and Navy Nights (NYPL b18358405-5143306).tiff has topics "Theater; New York (State); New York; Theater; Production and direction". Theatre and New York use the Categorise template and are skipped, while the other two topics have no category matches and are ignored.
Use of GLAMwiki toolset
[edit]The cost/benefit ratio of crafting XML output so that GWT could present similar image pages and categorization is not really worth the "calendar time" saving, as this would mean carefully reviewing each collection to handle JSON mapping oddities. Similarly, avoiding a "big bang" upload process means that the community has a more realistic period to discover the new uploads and provide feedback on quality issues.
The GWT was used for uploading some TIFFs over 100MB in filesize, but this became unnecessary after changing from Pywikibot compat to the core version which includes the ability to use chunked uploading and files up to 1GB in size. Based on past experience, home broadband may effectively limit uploads to 8GB per month, principally due to slow upload speeds, a magnitude slower than what is possible with GWT. However as a slow upload process gives more time for feedback and corrections, this may end up being a key additional benefit.
The above method has been superseded by white-listing of url transfers for both TIFF and jpeg images from the nypl.org domain. This enables fast uploading of even the 100MB+ files as they are no longer reliant on my home broadband connection. There are still double checks on SHA1 matches and post uploaded double checks on the image pages, but the process is a magnitude faster.
Multiprocessing
[edit]From 19 May 2016, some of the uploads started to use multiprocessing. This enables the uploading of several files in parallel. The Python code works as normal, going through the collection linearly, but then spawns processes to handle each upload.
After trials, the parallel uploading was limited to 2 files at the same time for the same collection. Though multiprocessing ensures that there is no gap between uploading files while parsing source metadata, there was no apparent speed benefit in having more than a couple of files uploading in parallel. The reasons for this are likely to be a combination of (a) there are automated throttling processes at the Wikimedia server end, (b) it is likely that automated throttles apply at the NYPL source, and (c) the files are reliant on a home broadband connection with a fixed upload bandwidth limit.
For the NYPL batch upload, the relevant multiprocessing calls use the module Process and a custom fuction processup which handles the Commons upload call. Here's the relevant call which creates the parallel processing thread and sticks the reference to it in an threads array which then is later used to track the pool of active threads and waits for one to complete whenever the maximum desired number is reached before adding the next:
t = Process(target=processup, args=(source, filename, d, acount, count, totalpages, perpage, filenameid, tname, comment + " " + tname, localsize, ), name=tname)
threads.append(t)
t.start()
Parsing the NYPL API (with Python)
[edit]The NYPL API service requires a login which you set up by providing an account name and password on the website http://api.repo.nypl.org/. In response the site provides you with a token for API calls which is valid for a maximum of 10,000 requests in a day (if you hit this, you are probably using calls very inefficiently!). When pulling information from the API, Python needs to put an authorization code in the url request header. This looks like headers = { 'User-Agent' : 'Mozilla/5.0' , 'Authorization': 'Token token="abcdefghijklmn"'}
where the token is the long code that NYPL provided and headers would be passed in calls like http.urlopen('GET', u, preload_content=False, headers=headers)
. I happen to be using urllib3 to handle https, but that's not essential for the NYPL calls to work.
Here's a cut & paste from a terminal window where my parser is pulling out relevant information for an NYPL image and ends up creating Italy._Kingdom_of_the_Two_Sicilies,_1817-1819_(NYPL_b14896507-1608945).tiff as part of the Vinkhuijzen collection upload:
ID_Hades struc ID (legacy) 1098453 ID_uuid 98b9f420-c54a-012f-b4ed-58d385a7bc34 uuid 510d47e3-f235-a3d9-e040-e00a18064a99 title Italy. Kingdom of the Two Sicilies, 1817-1819. url http://digitalcollections.nypl.org/items/510d47e3-f235-a3d9-e040-e00a18064a99 topics ID_Hades Collection Guide ID (legacy) 206 filename Italy. Kingdom of the Two Sicilies, 1817-1819 (NYPL b14896507-1608945).tiff imageID 1608945 note Italy. Kingdom of the Two Sicilies, 1817-1819. source http://link.nypl.org/oCcX5CERRragkGJruYT_Ywr ID_NYPL catalog ID (B-number) b14896507 date ID_CATNYP ID (legacy) b6535738 modslink http://api.repo.nypl.org/api/v1/items/mods/510d47e3-f235-a3d9-e040-e00a18064a99 ID_RLIN/OCLC 45057766
The filename is being generated using the naming convention using the NYPL title and best NYPL unique identities for the artifact plus the imageID which is then unique to the page level within an artifact (i.e. document). Note that no date or topics have been found for this artifact. The modslink provides most of this information but requires JSON parsing. The parsing needs to be done defensively, in practice the data scheme used may vary between collections, for example the use of 'note' fields to contain a useful description may be absent and in this example is being replaced by the title text. The NYPL unprocessed source looks as below:
{"nyplAPI":{"request":{"uuid":{"$":"510d47e3-f235-a3d9-e040-e00a18064a99"}},"response":{"headers":{"status":{"$":"success"},"code":{"$":"200"},"message":{"$":"ok"}},"mods":{"version":"3.4","schemaLocation":"http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-4.xsd","titleInfo":{"supplied":"no","usage":"primary","title":{"$":"Italy. Kingdom of the Two Sicilies, 1817-1819."}},"name":{"authority":"","type":"personal","valueURI":"","namePart":{"$":"Vinkhuijzen, Hendrik Jacobus"},"role":{"roleTerm":[{"authority":"marcrelator","type":"code","valueURI":"http://id.loc.gov/vocabulary/relators/col","$":"col"},{"authority":"marcrelator","type":"text","valueURI":"http://id.loc.gov/vocabulary/relators/col","$":"Collector"}]}},"typeOfResource":{"$":"still image"},"language":{"objectPart":"","languageTerm":[{"authority":"iso639-2b","type":"code","valueURI":"http://id.loc.gov/vocabulary/iso639-2/eng","$":"eng"},{"authority":"iso639-2b","type":"text","valueURI":"http://id.loc.gov/vocabulary/iso639-2/eng","$":"English"}]},"subject":{"authority":"lcsh","valueURI":"http://id.loc.gov/authorities/subjects/sh2008107806","topic":[{"authority":"lcsh","valueURI":"http://id.loc.gov/authorities/subjects/sh85139693","$":"Military uniforms"},{"authority":"lcsh","valueURI":"http://id.loc.gov/authorities/subjects/sh85061212","$":"History"}]},"identifier":[{"displayLabel":"CATNYP ID (legacy)","type":"local_catnyp","$":"b6535738"},{"displayLabel":"RLIN/OCLC","type":"local_other","$":"45057766"},{"displayLabel":"NYPL catalog ID (B-number)","type":"local_bnumber","$":"b14896507"},{"displayLabel":"Hades Collection Guide ID (legacy)","type":"local_hades_collection","$":"206"},{"displayLabel":"Hades struc ID (legacy)","type":"local_hades","$":"1098453"},{"type":"uuid","$":"98b9f420-c54a-012f-b4ed-58d385a7bc34"}],"location":[{"physicalLocation":[{"authority":"marcorg","type":"repository","$":"nn"},{"type":"division","$":"General Research Division"},{"type":"division_short_name","$":"General Research Division"},{"type":"code","$":"GRD"}],"shelfLocator":{"$":"8-MMEH (Vinkhuijzen collection of military uniforms)"}},{"shelfLocator":{"$":"8-MMEH (Vinkhuijzen collection of military uniforms) vol. 401"}},{"physicalLocation":[{"type":"division","$":"General Research Division"},{"type":"division_short_name","$":"General Research Division"},{"type":"code","$":"GRD"}]}],"relatedItem":{"type":"host","titleInfo":{"title":{"$":"Italy. Kingdom of the Two Sicilies, 1817-1819."}},"identifier":[{"type":"uuid","$":"938f3a70-c54a-012f-08ac-58d385a7bc34"},{"type":"local_hades","$":"1097395"}],"relatedItem":{"type":"host","titleInfo":{"title":{"$":"Italy"}},"identifier":[{"type":"uuid","$":"bd07a6c0-c546-012f-6cf8-58d385a7bc34"},{"type":"local_hades","$":"773362"}],"relatedItem":{"type":"host","titleInfo":{"title":{"$":"The Vinkhuijzen collection of military uniforms"}},"identifier":[{"type":"uuid","$":"51894d20-c52f-012f-657d-58d385a7bc34"},{"type":"local_hades","$":"269277 local_catnyp"}]}}}},"rightsStatement":{"$":"We believe that this item has no known US copyright restrictions. The item may be subject to rights of privacy, rights of publicity and other restrictions. Though not required, if you want to credit us as the source, please use the following statement, \"From The New York Public Library.\" Doing so helps us track how our collection is used and helps justify freely releasing even more content in the future."}}}}
Phabricator tickets and unresolved failures
[edit]Miscellaneous tickets
[edit]- Phab:T137687 added the domain images.nypl.org to the Commons url whitelist. This enabled url uploading of jpeg versions of images in addition to the TIFFs which come from link.nypl.org.
- Phab:T140075 (July 2016) WMF ops requested to limit uploads as my account seemed to account for unexpectedly large data spikes. Ops were planning on ~140GB/day but my NYPL uploads alone were hitting over 220GB/day. Planned new hardware to go online this month will make higher data rates easier to accommodate and this irrelevant.
Stashedfilenotfound
[edit]This is an error introduced during the transition to Pywikibot-core. It makes it impossible to use url transfers to upload large TIFFs where the jpeg version already exists (the error to be ignored is termed "exists-normalized" by the Commons API). At the current time there is no easy work-around. For jpeg versions to be uploaded, these are still being uploaded from a local copy without using "chunking", in order to get around the problem, but this would be an unrealistic solution for TIFFs which may be of the order of 200MB in size. As a result of this problem, there may be NYPL files that have been uploaded in jpeg format for which the TIFF version has been skipped.
This bug was first raised on Pywikibot in March 2016. Refer to Phab:T129471 and Phab:T138206.
Multi-sheet problem (done)
[edit]Example of UUID with multiple images, front and back of a stereogram, in this case with a description on the back. |
Example item query: http://api.repo.nypl.org/api/v1/items/449d4220-c58d-012f-dcd3-58d385a7bc34 First page uploaded: File:The daisy (NYPL Hades-446489-1152945).jpg
The default assumption is that the UUID matches to an image, however multiple images (like the sheet music above) have multiple ImageIDs to one UUID. The group can be pulled out of the imageLink matches as a list to loop on.
Done Though it would be nice if the images cross-linked using other_versions, clicking the UUID will show all images relating to the object on Commons. A large example can be found at Category:The Automobile Club of America Tour Book.Colour profile problems
[edit]Some TIFFs have types of colour profile that result in badly presented images using Firefox, but may display correctly using Chrome. The colour profile can be stripped or replaced in the TIFF. As changing the file means that checksum (SHA1) duplicate searches will fail, it is best if the original image from the NYPL archives is uploaded and then overwritten.
If a profile detection method can be devised, it may be possible to automate this correction.
A sample of images have the first 500 characters from the colour profile included on the image text page as a hidden comment so that different types of profile can be searched out using source searching. For example matches to "_HapplscnrRGB Lab" display badly in Firefox v43 (the current version at the time of writing). TIFF colour profile data is not available through the Commons API, the displayed EXIF data, or any other methods.
Bad TIFF
[edit]Files where on an attempted upload the API returns 'tiff_bad_file' have yet to be explained. It could be that for some NYPL files, the TIFF compression is in an unusual format. These files cannot be uploaded without being transcoded, so are skipped in this batch project. It is worth noting that for the example under 100MB, the custom uploader uses the Python Image Library to query tiffinfo and works successfully, it is the use of a tiffinfo command at the WMF server side that falls over.
- Example
- http://digitalcollections.nypl.org/items/510d47dd-ea5c-a3d9-e040-e00a18064a99 All aboard for Podunk
{u'servedby': u'mw1125', u'error': {u'info': u"This file did not pass file verification: The uploaded file contains errors: tiffinfo command failed: '/usr/bin/tiffinfo' '/tmp/r0fzoW' 2>&1", u'*': u'See http://commons.wikimedia.org/w/api.php for API usage', u'code': u'verification-error', u'details': [u'tiff_bad_file', u"tiffinfo command failed: '/usr/bin/tiffinfo' '/tmp/r0fzoW' 2>&1"]}}
Reports
[edit]These are reports created using the GLAM dashboard run by Faebot. The reports include the 80,000 lower resolution jpeg images successfully loaded from the NYPL in 2008 by Dcoetzee, which distorts the results:
Volunteers
[edit]Largest
[edit]-
381 MP
20492x18604 pixels -
352 MP
18446x19061 pixels -
289 MP
15621x18477 pixels -
284 MP
14811x19144 pixels -
278 MP
26667x10438 pixels -
258 MP
25835x9983 pixels -
245 MP
17907x13704 pixels -
236 MP
17954x13130 pixels -
230 MP
16453x14000 pixels -
221 MP
21478x10307 pixels -
221 MP
12935x17088 pixels -
220 MP
21363x10305 pixels -
220 MP
21389x10285 pixels -
215 MP
21349x10086 pixels -
211 MP
20612x10220 pixels -
205 MP
12352x16598 pixels -
205 MP
20441x10015 pixels -
199 MP
19679x10124 pixels -
197 MP
19169x10260 pixels -
189 MP
18613x10175 pixels -
189 MP
18444x10230 pixels -
187 MP
13465x13922 pixels -
186 MP
20000x9298 pixels
Popular categories
[edit]- Robert N. Dennis collection of stereoscopic views (135,665)
- PD 1923 (91,050)
- PD Old (89,394)
- PD US (79,837)
- Artworks without Wikidata item (78,744)
- NYPL The Miriam and Ira D. Wallach Division of Art (67,790)
- PNGs with JPEG versions (36,096)
- Buttolph collection of menus (29,406)
- NYPL General Research Division (23,087)
- NYPL Rare Book Division (21,333)
- NYPL American popular songs (17,309)
- Emmet Collection of Manuscripts (14,569)
- NYPL Manuscripts and Archives Division (9,338)
- Theodorus Bailey Meyers Collection (9,011)
- Military uniforms of France by Vinkhuijzen collection (5,849)
- Military uniforms (5,841)
- Samuel Putnam Avery Collection (5,690)
- NYPL Henry W. and Albert A. Berg Collection of English and American Literature (5,147)
- Writings of Hawthorne (4,948)
- Media needing category review as of 7 July 2016 (4,443)
- Media needing category review as of 8 July 2016 (4,357)
- NYPL Stereoscopic views of Niagara Falls (4,228)
- Vinkhuijzen: Military uniforms of The Netherlands (4,171)
- Media needing category review as of 21 June 2016 (4,062)
- Media needing category review as of 18 June 2016 (3,927)
- Media needing category review as of 25 June 2016 (3,883)
- Detroit Publishing Co. (3,849)
- Media needing category review as of 26 June 2016 (3,761)
- Media needing category review as of 16 June 2016 (3,408)
- Media needing category review as of 22 June 2016 (3,361)
- Media needing category review as of 23 June 2016 (3,351)
- Media needing category review as of 24 June 2016 (3,340)
- Media needing category review as of 15 June 2016 (3,331)
- Media needing category review as of 9 July 2016 (3,259)
- James Madison papers (3,236)
- Media needing category review as of 14 July 2016 (3,207)
- Media needing category review as of 17 June 2016 (3,201)
- Jerome Robbins Dance Division (3,174)
- R. H. Burnside Collection (2,903)
- Media needing category review as of 10 July 2016 (2,853)
- Media needing category review as of 6 July 2016 (2,853)
- Media needing category review as of 12 July 2016 (2,834)
- Media needing category review as of 11 July 2016 (2,818)
- Media needing category review as of 13 July 2016 (2,808)
- Media needing category review as of 13 June 2016 (2,805)
- Media needing category review as of 4 July 2016 (2,804)
- Media needing category review as of 15 July 2016 (2,776)
- Media needing category review as of 20 June 2016 (2,742)
- Media needing category review as of 5 July 2016 (2,719)
- NYPL The Miriam and Ira D. Wallach Division of Art (2,565)
- Canyons (2,543)
- Media needing category review as of 14 June 2016 (2,516)
- Media needing category review as of 3 July 2016 (2,388)
- Media needing category review as of 27 June 2016 (2,359)
- Lawrence H. Slaughter Collection of English maps (2,258)
- Media needing category review as of 1 July 2016 (2,157)
- A. G. Spalding Baseball Collection (2,125)
- The Automobile Club of America Tour Book (2,044)
- Media needing category review as of 29 June 2016 (2,009)
- Betrothal in popular 19th and early 20th century American music (1,971)
- Media needing category review as of 2 July 2016 (1,898)
- Media needing category review as of 19 June 2016 (1,893)
- Stereo cards of Boston (1,887)
- African American culture in popular 19th and early 20th century American music (1,854)
- Media needing category review as of 16 July 2016 (1,834)
- Military uniforms of Spain by Vinkhuijzen collection (1,790)
- Vinkhuijzen: Military uniforms of Italy (1,785)
- Monuments from Egypt and Ethiopia to the drawings of the kings of Prussia (1,742)
- Media needing category review as of 28 October 2016 (1,618)
- Media needing category review as of 19 October 2016 (1,594)
- Vinkhuijzen: Military uniforms of Germany (1,479)
- Love in popular 19th and early 20th century American music (1,469)
- Media needing category review as of 30 October 2016 (1,453)
- Media needing category review as of 30 June 2016 (1,446)
- Vinkhuijzen:Austria (1,396)
- Media lacking a description (1,394)
- Vinkhuijzen: Military uniforms of Germany (1,390)
- Media needing category review as of 25 October 2016 (1,389)
- Media needing category review as of 15 October 2016 (1,367)
- NYPL Irma and Paul Milstein Division of United States History (1,349)
- Media needing category review as of 26 October 2016 (1,332)
- Media needing category review as of 1 November 2016 (1,329)
- Media needing category review as of 18 October 2016 (1,314)
- Death in popular 19th and early 20th century American music (1,310)
- New York Public Library Visual Materials (1,233)
- Picturesque Palestine, Sinai, and Egypt (1,226)
- NYPL Lionel Pincus and Princess Firyal Map Division (1,212)
- Media needing category review as of 27 October 2016 (1,208)
- Media needing category review as of 31 October 2016 (1,184)
- Vinkhuijzen:Great Britain (1,111)
- Media needing category review as of 16 February 2016 (1,110)
- NYPL Science (1,102)
- Military uniforms of Sweden by Vinkhuijzen collection (1,072)
- Pacific pursuits postcards (1,057)
- Media needing category review as of 17 February 2016 (1,055)
- Media needing category review as of 16 October 2016 (1,052)
- NYPL Stereoscopic views of Saratoga Springs (1,040)
- NYPL Centennial Photographic Co (1,017)
- Media needing category review as of 1 February 2016 (1,002)
- Anniversary menus (1,001)
Improvement
[edit]Ten randomly selected files with a single mainspace use on Wikimedia projects:
Up to ten randomly selected files with the lowest category counts in the project: