User:Dispenser/Absurd overhead
Do some images seem to have unreasonably huge file size for their low resolution? Did you know you can easily hide Zip files in JPEGs because of SFX support? This program calculates a hypothetical uncompressed equivalent image and flags the source if it is larger. Then it downloads all flagged images, analyzes them by looking for various magic numbers, extracts metadata, and attempts with OptiPNG, pngout, and jpegtran to optimize the image.
We need help identifying what's in these files, deleting malicious contents, and categories files with extra data (e.g. Category:Fireworks PNG files). Columns with percentage indicate the file size reduction with those operations. Using the following commands: BMP: If only image data was kept uncompressed 3*img_width*img_height
. Zip: gzip -9
. If high, file is "zero padded" Trim: jpegtran -copy all
for JPEGs, pngout -ks -s4
for PNGs Opti: jpegtran -copy all -optimize
for JPEG, optipng -fix
then pngout
Abuse filters: JPEG, PNG, GIF, and large newbie uploads; See also: GIF check, User:Embedded Data Bot, and Dec 2016 discussion
/* Absurd Overhead (approximate)
* Author: Dispenser
* License: Public Domain
* Run time: 40 minutes <SLOW_OK>
*/
SELECT img_name, img_size, img_timestamp, img_user_text, img_sha1,
IF(img_minor_mime="jpeg", 3, /* Workaround for [[phab:T132986]] */
IF(img_metadata LIKE '%s:16:"truecolour-alpha"%', 4,
IF(img_bits<8 OR img_metadata LIKE '%s:14:"index-coloured"%' OR img_metadata LIKE '%s:9:"greyscale"%', 1, 3)
) * img_bits / 8
) * img_width * img_height + IFNULL(fo_size, 50*1024) AS est_size
FROM image
JOIN page ON page_namespace=6 AND page_title=img_name
LEFT JOIN u2815__.file_overhead ON fo_page=page_id /* magic overhead */
LEFT JOIN categorylinks ON cl_from=page_id AND cl_to IN ("Animated_PNG", "Fireworks_PNG_files", "Picture_It!_files")
WHERE img_width>0 AND img_height>0
AND img_media_type="BITMAP" AND img_major_mime="image"
AND img_minor_mime IN ("jpeg", "png")
AND img_size > 250 * 1024
AND cl_from IS NULL
HAVING img_size > est_size
/* Worst offender by absolute size; works better than percentages */
ORDER BY cast(img_size as signed) - est_size DESC;
Name | Date | Size (KB) | BMP | Zip | Trim | Opti | Links | Usage | Uploader | Notes |
---|---|---|---|---|---|---|---|---|---|---|
File:Gołębie w Łazienkach.jpg | 2016-08-30 | 31,426 | -44% | -1% | -100% | -100% | ImgOps, Exif, Google | 0 | Korzoneczek (talk, 48 edits) | C115XXU1BNF1 |
File:Gołębie w Łazienkach Królewskich 2.jpg | 2016-08-30 | 28,764 | -39% | -1% | -100% | -100% | ImgOps, Exif, Google | 0 | Korzoneczek (talk, 48 edits) | C115XXU1BNF1 |
File:Dům u Zvonu 2002 - panoramio - avu-edm (2).jpg | 2016-11-12 | 8,224 | -72% | -49% | -97% | -97% | ImgOps, Exif, Google | 0 | Panoramio upload bot (talk) | |
File:No words ... - panoramio.jpg | 2017-02-26 | 13,305 | -31% | -0% | -97% | -97% | ImgOps, Exif, Google | 0 | Panoramio upload bot (talk) | Picasa |
File:Cardiff Woods Park (31875786093).jpg | 2017-03-23 | 4,422 | -51% | -0% | -89% | -90% | ImgOps, Exif, Google | 0 | Fæ (talk) | G930VVRU4API3, MP4 |
File:Derrière prés Faure4 - panoramio.jpg | 2017-03-31 | 2,254 | -57% | -0% | -96% | -96% | ImgOps, Exif, Google | 0 | Panoramio upload bot (talk) | |
File:Малевич Супрематизм.jpg | 2017-05-11 | 2,595 | -46% | -0% | -98% | -99% | ImgOps, Exif, Google | 1 | Shakko (talk) | |
File:Gallant Company in a Park by K.Malevich (1908, Stedelijk).jpg | 2017-05-11 | 2,958 | -29% | +0% | -94% | -94% | ImgOps, Exif, Google | 0 | Shakko (talk) | |
File:Konkan - Songiri Village - Ratnagiri District - Part 1 20151218 (23914630611).jpg | 2017-04-11 | 3,522 | -23% | -1% | -89% | -89% | ImgOps, Exif, Google | 0 | Fæ (talk) | G925FXXU3QOJ4, MP4 |
File:Lopegrav og kanonstilling ved urskog fort.png | 2009-10-16 | 3,880 | -20% | -26% | -30% | -43% | ImgOps, Exif, Google | 0 | TommyG (talk) | CRC error in chunk zTXt (computed 5da00715, expected a920ff85), 375 KB metadata |
File:Englishman in Moscow.jpg | 2017-05-11 | 2,136 | -36% | -1% | -96% | -96% | ImgOps, Exif, Google | 37 | Shakko (talk) | |
File:Großflugvoliere.png | 2010-12-04 | 17,106 | +29% | +0% | +0% | -3% | ImgOps, Exif, Google | 1 | File Upload Bot (Magnus Manske) (talk) | chunk iCCP at offset 0x00032, length 2420: not allowed with sRGB |
File:Talven ihmeet.jpg | 2017-01-21 | 870 | -56% | -0% | -95% | -95% | ImgOps, Exif, Google | 0 | Sasuki 888 (talk, 51 edits) | |
File:TCL5.png | 2015-09-23 | 696 | -65% | -65% | -72% | -87% | ImgOps, Exif, Google | 0 | Tclcommir (talk, 33 edits) | 40 KB metadata |
File:Bangs.Kapelle.Votivbild.Franzosenkriege.um 1800.Reprod.Johann Wanner.png | 2008-09-04 | 1,024 | -23% | -47% | -23% | -65% | ImgOps, Exif, Google | 1 | Sums (talk) | MIME detected: Unknown at 543.0 KB (556441 bytes), additional data after IEND chunk |
File:Shelomo-Israel-Cherezli.jpg | 2014-07-25 | 1,114 | -35% | -0% | -96% | -96% | ImgOps, Exif, Google | 2 | Ben tetuan (talk) | |
File:DJ Hi Sko.png | 2015-01-01 | 546 | -70% | -79% | -70% | -86% | ImgOps, Exif, Google | 0 | Djhiskoing (talk, 1 edits) | MIME detected: Unknown at 114.0 KB (116908 bytes), additional data after IEND chunk |
File:Maui banyan - panoramio.jpg | 2016-12-05 | 5,574 | -7% | -1% | -65% | -67% | ImgOps, Exif, Google | 0 | Panoramio upload bot (talk) | |
File:Empress XiaoZhe.PNG | 2007-09-09 | 1,792 | -19% | +0% | -19% | -47% | ImgOps, Exif, Google | 8 | Highshines~commonswiki (talk, BLOCKED) | MIME detected: Unknown at 1.0 MB (1095835 bytes), additional data after IEND chunk |
File:Porky's Pooch Title Card.jpg | 2016-08-18 | 560 | -59% | -11% | -96% | -96% | ImgOps, Exif, Google | 0 | Kyleman88789 (talk, 2 edits) | |
File:Groningen 1559-1608.jpg | 2009-08-18 | 568 | -56% | -0% | -96% | -96% | ImgOps, Exif, Google | 0 | File Upload Bot (Magnus Manske) (talk) | |
File:Bonn Langer Eugen (5).jpg | 2006-06-12 | 1,522 | -20% | -0% | -97% | -97% | ImgOps, Exif, Google | 0 | Leit (talk) | |
File:Kanon ved urskog fort.png | 2009-10-16 | 2,910 | -9% | -25% | -18% | -37% | ImgOps, Exif, Google | 1 | TommyG (talk) | CRC error in chunk zTXt (computed ce50c0c8, expected aa6af66e), 254 KB metadata |
File:Igor Andreev.jpg | 2007-04-26 | 473 | -40% | -0% | -80% | -81% | ImgOps, Exif, Google | 16 | Spyder Monkey (talk) | |
File:Złota brama w Gdańsku 1687 r.jpg | 2007-01-25 | 1,983 | -9% | -0% | -94% | -94% | ImgOps, Exif, Google | 11 | MARCIN N (talk) | |
File:Korona Boleslawa Chrobrego.jpg | 2007-12-29 | 420 | -43% | -0% | -95% | -95% | ImgOps, Exif, Google | 14 | Martimar (talk) | |
File:'99-'02 Chevrolet Astro Cargo.JPG | 2007-11-18 | 1,444 | -11% | -0% | -96% | -96% | ImgOps, Exif, Google | 0 | Bull-Doser (talk) | |
File:Ossolinski Kazanowski Palace.jpg | 2007-09-07 | 408 | -37% | -2% | -94% | -94% | ImgOps, Exif, Google | 17 | Polaco77~commonswiki (talk) | |
File:Josef Ritter von Bergmann 1854.png | 2014-12-25 | 1,024 | -13% | -49% | -13% | -63% | ImgOps, Exif, Google | 0 | AndreasPraefcke (talk) | MIME detected: Unknown at 518.0 KB (531236 bytes), additional data after IEND chunk |
File:Civic Sedan 1992-95.jpg | 2012-01-14 | 895 | -13% | -0% | -96% | -96% | ImgOps, Exif, Google | 0 | BotMultichillT (talk) | |
File:Elisabeth Aspe.jpg | 2010-01-07 | 2,691 | -4% | -19% | -96% | -96% | ImgOps, Exif, Google | 9 | WikedKentaur (talk) | |
File:Painting by the Empress Dowager Cixi 01.jpg | 2007-03-23 | 3,124 | -3% | -3% | -94% | -94% | ImgOps, Exif, Google | 1 | Highshines~commonswiki (talk, BLOCKED) | |
File:Józef Dowbor-Muśnicki.PNG | 2016-05-25 | 674 | -10% | -1% | -10% | -67% | ImgOps, Exif, Google | 8 | Andros64 (talk) | MIME detected: Unknown at 284.0 KB (291324 bytes), additional data after IEND chunk |
File:Lan Ping666.jpg | 2007-05-15 | 499 | -13% | -0% | -94% | -94% | ImgOps, Exif, Google | 0 | Highshines~commonswiki (talk, BLOCKED) | |
File:Blason Rogéville.JPG | 2007-01-14 | 1,472 | -4% | -55% | -96% | -96% | ImgOps, Exif, Google | 0 | Bauer3~commonswiki (talk, 2 edits) | |
File:Astudillo.JPG | 2008-05-04 | 598 | -9% | -0% | -92% | -92% | ImgOps, Exif, Google | 3 | Lbleye (talk, 81 edits) | |
File:Burgtheater-Plakat.jpg | 2011-02-12 | 1,530 | -3% | -0% | -94% | -95% | ImgOps, Exif, Google | 3 | File Upload Bot (Magnus Manske) (talk) | |
File:Villa Scheidgen Aufriss Straßenfront 1906.jpg | 2012-12-09 | 1,458 | -3% | -5% | -90% | -91% | ImgOps, Exif, Google | 1 | Leit (talk) | |
File:Thermotoga maritima encapsulin.png | 2015-11-07 | 1,098 | -4% | -54% | -4% | -55% | ImgOps, Exif, Google | 1 | Rob Hurt (talk, 56 edits) | additional data after IEND chunk |
File:S-Series Wagon.jpg | 2012-01-14 | 704 | -5% | -0% | -96% | -96% | ImgOps, Exif, Google | 0 | BotMultichillT (talk) | |
File:Buffalo River cabin 3.png | 2008-11-30 | 277 | +26% | +0% | +25% | -19% | ImgOps, Exif, Google | 3 | Acroterion (talk) | chunk tEXt at offset 0x0003a, length 9: text contains NULL character(s) |
File:Yusuf Agah Efendi.png | 2016-08-16 | 2,706 | -1% | -47% | -0% | -48% | ImgOps, Exif, Google | 1 | Pangaea W (talk, 74 edits) | chunk cHRM at offset 0x046f9, length 32: invalid green point 0.1596 0.84039 |
File:Kozbekçi Mustafa Ağa.png | 2016-08-16 | 3,078 | -0% | -46% | -0% | -47% | ImgOps, Exif, Google | 1 | Pangaea W (talk, 74 edits) | chunk cHRM at offset 0x046f9, length 32: invalid green point 0.1596 0.84039 |
File:Mehmed Said Efend.png | 2016-08-16 | 3,046 | -0% | -45% | -0% | -47% | ImgOps, Exif, Google | 0 | Pangaea W (talk, 74 edits) | chunk cHRM at offset 0x046f9, length 32: invalid green point 0.1596 0.84039 |
File:Gebze’de Çoban Mustafa Paşa Külliyesi.png | 2016-08-16 | 2,872 | -0% | -30% | -0% | -33% | ImgOps, Exif, Google | 2 | Pangaea W (talk, 74 edits) | chunk cHRM at offset 0x046f9, length 32: invalid green point 0.1596 0.84039 |
File:Sohbet (Tablo).png | 2016-08-16 | 2,881 | -0% | -40% | -0% | -42% | ImgOps, Exif, Google | 1 | Pangaea W (talk, 74 edits) | chunk cHRM at offset 0x0453d, length 32: invalid green point 0.1596 0.84039 |