Commons:WMF support for Commons/Upload Wizard Improvements/Logo detection

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

As part of our work to help with media moderation, we developed a tool to automatically detect logos when uploaded on Commons, in order to facilitate their evaluation by the community. A need for machine detection tools was raised in several discussions and user interviews we had in the past with the community, and logos are the second reason for media deletion.

The integration of the tool will be worked on during July - September 2024 as part OKR work for Fiscal Year 2024-2025 key result WE2.3 (“Guide contributors to add images and references that comply with project guidelines and increase trust in content, for example, by flagging potential issues during their upload/addition”).

Current workflow

[edit]

The logo detection tool will automatically detect if a media uploaded by a user has a high likelihood to be a logo, and will tag it accordingly after its upload, to ease moderation review. The current ideas are either tagging it accordingly or applying automatically a template that categorizes the media as a logo to be verified (with an ability to filter by own or not own work). This is the first step towards an automatic process to estimate the likelihood that an uploaded image will be deleted for any reason.

In addition, we are assessing to potentially alert uploading users that the media they are uploading might be a logo and that, if it doesn't meet the guidelines, it might be deleted. In case of multiple uploads, the warning will be shown close to the affected image. The warning may be limited to certain classes of users (i.e. all those who do not have autoconfirmed status), pending community consensus.

Evaluation

[edit]

Given as input an image file, we detect whether it's a logo by training an EfficientNetV2 classifier to predict logo and out-of-domain probability scores. The experiment overall showed promising results for eventual integration in the ecosystem.

We report below the model that performed best against a test dataset, and observe that it's accurate enough to reliably fulfill the task.

  • source: available Commons images
  • # images: 47,976 - half belonging to Category:Logos, half random
  • accuracy: 96.9
  • AUC precision/recall: 98.8
  • AUC ROC: 99
  • loss: 10.2
  • best training epoch: 8

Metrics

[edit]

How likely is this a logo?

[edit]

The gallery below displays 50 images and their logo probability score. Images are randomly sampled from the test dataset described above.

Reports on logos detected by the tool

[edit]

This section lists monthly reports created by the Structured Content team about Commons uploads detected as logos. We publish 2 datasets: one of available uploads and one of deleted uploads as of the publication date.