User:AbdealiJK/file-metadata/GSoC2016
This page has a final report for the GSoC 2016 project: "Port catimges.py to pywikibot-core".
- Student: User:AbdealiJK
- Mentors: User:Jayvdb, User:DrTrigon
- Original project description: phab:T66838
- Project proposal submitted: phab:T129611
- Weekly reports of the project: phab:T133762
- Source Code: github:file-metadata, pypi:file-metadata
Aim
[edit]The aim of the project was to convert functionality provided by the catimages.py script in pywikibot-compat to pywikibot-core. The catimages.py project used various methods to identify the category an image falls in, including reading metadata like EXIF and Computer Vision to detect faces.
While doing this, some of the key aims were:
- To make the code more stable, as the earlier catimages was considered more of a proof of concept
- To use the latest frameworks and libraries when possible rather than the older ones used in the earlier script
- To ensure that all dependencies were well supported with unittests and continuous integration and had an active maintainer
- To clean up the code of catimages and add CI and unittests to ensure that other contributors can extend upon it easily
- To make catimages user friendly with an easy installation procedure to get it up and running
Work done during project
[edit]As part of the project, we made a new github repository which contains all the code of the project. This github project is going to be the official location of the catimages script.
- Contributions to catimages: v0.2.0 of the script was released by the end of the project (github release, pypi release). Statistics of contributions during the project can be seen at github:graphs:contributors
- Documentation to run the code and also to people who wish to contribute can be found at https://commons.wikimedia.org/wiki/User:AbdealiJK/file-metadata
- Contributions to other opensource projects: During the project, we also contributed and helped out in other projects which we were considering as dependencies (commits or discussions on issues):
Final Results
[edit]The script was run on various days of a week, and the results of categories found and so on have been found and documented here. These statistics was got by running the log-bot binary provided by file-metadata which was run on ToolLabs.
The detailed logs can be seen at:
- 20 Jul 2016 - On uncategoried files, On uploaded files
- 21 Jul 2016 - On uncategoried files, On uploaded files
- 22 Jul 2016 - On uncategoried files, On uploaded files
- 23 Jul 2016 - On uncategoried files, On uploaded files
- 24 Jul 2016 - On uncategoried files, On uploaded files
- 25 Jul 2016 - On uncategoried files, On uploaded files
- 26 Jul 2016 - On uncategoried files, On uploaded files
Notes:
- The script ran on every 5th file in the case of uploaded files, because the total number of files is too large (~10000 per day)
- To understand the quality of the categorization, the images were put into 3 buckets:
- Type: Categories that describe what type of image it is and also other metadata about the image (For example JPEG file, Graphic, Taken with <camera model>)
- Content: Categories that describe what is being seen in the image (For example, Faces)
- Location: Categories related to the location the image was taken (using GPS)
- The tables rendered below may not be optimal for small displays
- The uncategorized images were analyzed on 19 Aug 2016. Some of the images may not be uncategorized now.
All uploaded images | Uncategorized images | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10
20
30
40
50
60
70
80
90
100
20 Jul
21 Jul
22 Jul
23 Jul
24 Jul
25 Jul
26 Jul
|
10
20
30
40
50
60
70
80
90
100
20 Jul
21 Jul
22 Jul
23 Jul
24 Jul
25 Jul
26 Jul
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
All uploaded images | Uncategorized images | ||||||||||||||||||||||||||||||||||||||||||||||||
500
1,000
1,500
2,000
2,500
3,000
20 Jul
21 Jul
22 Jul
23 Jul
24 Jul
25 Jul
26 Jul
|
250
500
750
1,000
1,250
1,500
20 Jul
21 Jul
22 Jul
23 Jul
24 Jul
25 Jul
26 Jul
| ||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Examples
[edit]Here are a few examples of the results obtained by the script:
Faces
[edit]
|
Football kits
[edit]
|
|
|