User:LennardHofmann/GSoC 2022/Report 5
annual program that offers open-source software projects to post-secondary student developers | |||||
Upload media | |||||
Instance of |
| ||||
---|---|---|---|---|---|
Has part(s) |
| ||||
Organizer |
| ||||
Founded by | |||||
Start time |
| ||||
Inception |
| ||||
official website | |||||
| |||||
Over the last three months, in exchange with my mentor Mike Peel, I have been rewriting the Wikidata Infobox in Lua. The Infobox is shown on over 4 million category pages on Wikimedia Commons, a free and multilingual media repository, to inform readers about the topic of a category and help them browse the category system. A sample infobox, which displays information from Wikidata item Q1324301 and adapts to the language you have selected, can be seen on this page.
Now that Google Summer of Code 2022 is coming to an end, I would like to recap my work.
Where to find the code
Most of the time was spent removing code from Template:Wikidata Infobox/core, rewriting it in Lua, and adding it to Module:Wikidata Infobox/sandbox. Before the rewrite, that module was just a collection of helper functions—now, it produces the entire infobox. My changes to WikidataIB, the Infobox's biggest dependency, are listed here.
Recap
I documented my journey in four blog posts:
- Report 1: researching and experimenting
- Report 2: technical challenges and advice
- Report 3: Wikidata performance research
- Report 4: debugging puzzle
My work has not always been as exciting as the reports may suggest: I spent hours trying to make sense of complicated Wikitext expressions like this one:
{{#if:{{#property:P1950 | from={{{qid|}}}}} | {{#if:{{#property:P735 | from={{{qid|}}}}} | {{#invoke:WikidataIB |getValue |rank=best |P1950 |qid={{{qid|}}} | name=P1950 | |fwd={{{fwd|ALL}}} |osd={{{osd|no}}} |noicon=yes | linked=n | spf={{{spf|}}} | prefix="[""[Category:" |postfix=" (surname){{!}}{{#invoke:Wikidata Infobox|stripDiacrits|{{#invoke:WikidataIB |getValue |rank=best |P735 |qid={{{qid|}}} | name=P735 | |fwd={{{fwd|ALL}}} |osd={{{osd|no}}} |noicon=yes | linked=n | spf={{{spf|}}} | lang=en | sep=" "}}}}]]" | lang=en | sep=" "}} | {{#invoke:WikidataIB |getValue |rank=best |P1950 |qid={{{qid|}}} | name=P1950 | |fwd={{{fwd|ALL}}} |osd={{{osd|no}}} |noicon=yes | linked=n | spf={{{spf|}}} | prefix="[""[Category:" |postfix=" (surname)]]" | lang=en | sep=" "}} }} | }}
Recently, I added documentation on how to copy the Infobox to other wikis.
I have learned a lot about writing Scribunto modules, optimizing Lua code, and the edge cases of fetching data from Wikidata. Some of these edge cases are tracked at Wikidata infobox maintenance to help users improve Wikidata, see e.g. items with no claims.
This was my biggest coding project so far in terms of userbase and size (the Lua module has over 1800 carefully crafted lines), but I'm really happy with how it tuned out.
Results
The main purpose behind the rewrite was to improve the Infobox's performance because it took around 3.5 seconds to render a small category page like South Pole Telescope. Thanks to the rewrite, it now takes only half a second, and there are just a handful of category pages left that take longer than 3 seconds to load. Also, categories with very big Wikidata items like COVID-19 pandemic in Colombia can finally be rendered within MediaWiki's Lua memory bound.
What remains to be done
As Wikidata and Commons evolve, work on the Infobox will never be finished. Open tasks and feature requests can be found on the template's talk page. However, most of these tasks are either difficult to implement (date formatting) or need more discussion.
I will stick around and continue to fix bugs reported on the talk page.
Previous post: Report 4