Commons:File description page regular expressions
Shortcut: COM:REGEX This is a list of some regular expressions for localisation and general fixes for bots to do. Some of these are fairly trivial and should be combined with other tasks. Regexes marked as minor should not be run alone. If you have any regexes you use and would like to share, please add them below.
Everything is case-insensitive unless specified otherwise. The expressions should be executed from top to bottom. If any of these cause problems, please report it on the talk page. They're reasonably tested but no guarantees.
Localization/Internationalization
[edit]Headings
[edit]Task | Find | Replace | Notes |
---|---|---|---|
"Summary" heading | Add "== {{int:filedesc}} ==" to file pages where it is missing | [Minor], ideally done after all regex changes | |
"Summary" heading | (?:Краткое[ _]+)?описание|Beschreibung\,[ _]+Quelle|Quelle|Beschreibung|वर्णन|sumario|descri(ption|pción|ção do arquivo)|achoimriú)( */ *(?:summary|(?:Краткое[ _]+)?описание|Beschreibung\,[ _]+Quelle|Quelle|Beschreibung|वर्णन|sumario|descri(ption|pción|ção do arquivo)|achoimriú))? *\:? *\1</source> | $1 {{int:filedesc}} $1 | [MultiLine] |
"Licensing" heading | )?(za(?: +d\'uso)?|Лицензирование|li[zcs]en[zcs](e|ing|ia)?(?:\s+information)?( */ *(za(?: +d\'uso)?|Лицензирование|li[zcs]en[zcs](e|ing|ia)?(?:\s+information)?))?|\{\{\s*int:license\s*\}\})(\]\])? *\:? *\1</source> | $1 {{int:license-header}} $1 | [MultiLine] |
"Original upload log" headings | history)|file ?history|ursprüngliche bild-versionen) *\:? *\1</source> | $1 {{original upload log}} $1 | [MultiLine] |
Remove duplicate headings | <syntaxhighlight lang="text" enclose="none">^ *(\=+) *(.*?) *\=+ *[\r\n]+\=+ *\2 *\1 *$</source> | $1 $2 $1 | [MultiLine]; Run multiple times |
Multilingual tags
[edit]Task | Find | Replace | Notes | |
---|---|---|---|---|
{{Unknown}} | \s*(?:author|artist)\s*=\s*)(?:unknown?|\{\{\s*unknown\s*\}\}|\?+|unkown|unidentified|αγνωστος|sconosciuto|ignoto|desconocido|inconnu|inconnue|not given|not known|desconhecido|unbekannt|неизвестно|Не известен|neznana|nieznany|непознат|okänd|sconossùo|未知|ukjent|onbekend|nich kennt|ലഭ്യമല്ല|непознат|نهناسرا|descoñecido|不明|ignoto|óþekktur|tak diketahui|ismeretlen|nepoznat|לא ידוע|ûnbekend|tuntematon|نامعلوم|teadmata|nekonata|άγνωστος|ukendt|neznámý|desconegut|Неизвестен|ned bekannt|غير معروف|невідомий)\s*?\;?\.?\s*?(\ | \r|\n)</source> | $1{{unknown|author}}$2 | |
{{Own}} (part 1) | \s*source\s*=\s*)(?:own work)?\s*(?:-|;|</?br *[/\\]?>)?\s*(?:own(?: work(?: by uploader)?)?|(?:œuvre |travail )?personnel(?:le)?|self[- ]made|création perso|selbst fotografiert|obra pr[òo]pia|trabajo propr?io)\s*?(?:\(own work\))?\.? *(\ | \r|\n)</source> | $1{{own}}$2 | |
{{Own}} (part 2) | $1{{own}}$2 | |||
{{Own}} (part 3) | \s*source\s*=\s*)(?:own[^a-z]*work|opera[^a-z]*propria|trabajo[^a-z]*propio|travail[^a-z]*personnel|eigenes[^a-z]*werk|eigen[^a-z]*werk|собственная[^a-z]*работа|投稿者自身による作品|自己的作品|praca[^a-z]*pw[łl]asna|Obra(?:[^a-z]*do)?[^a-z]*pr[oó]prio|Treball[^a-z]*propi|Собствена[^a-z]*творба|Vlastní[^a-z]*dílo|Eget[^a-z]*arbejde|Propra[^a-z]*verko|Norberak[^a-z]*egina|عمل[^a-z]*شخصي|اثر[^a-z]*شخصی|자작|अपना[^a-z]*काम|נוצר[^a-z]*על[^a-z]*ידי[^a-z]*מעלה[^a-z]*היצירה|Karya[^a-z]*sendiri|Vlastito[^a-z]*djelo[^a-z]*postavljača|Mano[^a-z]*darbas|A[^a-z]*feltöltő[^a-z]*saját[^a-z]*munkája|Karya[^a-z]*sendiri|Eget[^a-z]*verk|Oper[aă][^a-z]*proprie|Vlastné[^a-z]*dielo|Lastno[^a-z]*delo|Сопствено[^a-z]*дело|Oma[^a-z]*teos|Eget[^a-z]*arbete|Yükleyenin[^a-z]*kendi[^a-z]*çalışması|Власна[^a-z]*робота|Sariling[^a-z]*gawa|eie[^a-z]*werk|сопствено[^a-z]*дело|Eige[^a-z]*arbeid|პირადი[^a-z]*ნამუშევარი)\;?\.? *(\ | \r|\n)</source> | $1{{own}}$2 | |
{{Own}} (part 4) | \s*source\s*=\s*)(((?:\'\'+)?)([\"\']?)(?:selbst\W*erstellte?s?|selbst\W*gezeichnete?s?|self\W*made|eigene?s?)\W*?(?:arbeit|aufnahme|(?:ph|f)oto(?:gra(?:ph|f)ie)?)?\.?\4\3) *(\ | \r|\n)</source> | $1{{own}}$5 | |
{{Self-photographed}} | \s*source\s*=\s*)(?:self[^a-z]*photographed|selbst[^a-z]*(?:aufgenommen|(?:f|ph)otogra(?:f|ph)iert?)|投稿者撮影|投稿者の撮影)\s*?\.? *(\ | \r|\n)</source> | $1{{self-photographed}}$2 | |
{{Anonymous}} | \s*author\s*=\s*)(?:anonym(?:e|ous)?|anonyymi|anoniem|an[oòóô]n[yi]mo?|ismeretlen|不明(匿名)|미상|ανώνυμος|аноним(?:ен|ный художник)|neznámy|nieznany|مجهول|Ананім|Anonymní|Ezezaguna|Anonüümne|אלמוני|អនាមិក|Anonimas|അജ്ഞാതം|Анонимный автор|佚名)\s*?\.?\;?\s*?(\ | \r|\n)</source> | $1{{anonymous}}$2 | |
{{Unknown photographer}} | \s*author\s*=\s*)(?:unknown\s*photographer|photographer\s*unknown)\s*?\;?\.?\s*?(\ | \r|\n)</source> | $1{{unknown photographer}}$2 | |
{{Private collection}} | \s*gallery\s*=\s*)private(?: collection)? *(\ | \r|\n)</source> | $1{{private collection}}$2 | |
{{See below}} | \s*permission\s*=\s*)(?:see\s*below|див\.?\s*нижче|дивись\s*нижче)\s*?\;?\.?\s*?(\ | \r|\n)</source> | $1{{see below}}$2 |
Task | Find | Replace | Notes |
---|---|---|---|
{{Original description page}} I | is|was) \[(?:https?:)?\/\/(?:www\.)?((?:[a-z\-]+\.)?wik[a-z]+(?:\-old)?)\.org\/w((?:\/shared)?)\/index\.php\?title\=(?:[a-z]+)(?:\:|%3A)([^\[\]\|}{]+?) +here(?:\]\.?|\.?\])(\s+All following user names refer to (?:\1(?:\.org)?\2|(?:wts|shared)\.oldwikivoyage)\.?)?</source> | {{original description page|$1$2|$3}} | |
{{Original description page}} II | %3A)([\w\%\-\.\~\:\/\?\#\[\]\@\!\$\&\'\(\)\*\+\,\;\=]+?)(?:| [^\]\n]*)\](?:\s*\,?\s*before it was transferr?ed to commons)?\.?</source> | {{original description page|$1|$2}} | |
{{Original description page}} III | \s*([a-z\-]+\.w[a-z]+)\s*\|\s*[^}\|\[{]+\}\})\s*using\s*\[\[\:en\:WP\:FTCG\|FtCG\]\]\.?</source> | $1{{transferred from|$3||[[:en:WP:FTCG|FtCG]]}} $2 |
Technique translations
[edit]These mainly apply to paintings and other artistic works.
Task | Find | Replace | Notes | |
---|---|---|---|---|
Oil on canvas | \s*technique\s*=\s*)(?:\{\{\s*(?:en|de) *\|)? *(?:oil[ -]on[ -]canvas|öl[ -]auf[ -]leinwand) *(?:\}\})?(\ | \r|\n)</source> | $1{{technique|oil|canvas}}$2 | |
Oil on wood | \s*technique\s*=\s*)\{\{\s*de *\|\s*öl[ -]auf[ -]holz\s*\}\}(\ | \r|\n)</source> | $1{{technique|oil|wood}}$2 | |
Oil on oak | \s*technique\s*=\s*)\{\{\s*de *\|\s*öl[ -]auf[ -]eichenholz\s*\}\}(\ | \r|\n)</source> | $1{{technique|oil|panel|wood=oak}}$2 | |
Oil on panel | \s*technique\s*=\s*)(?:\{\{\s*en *\|)? *oil[ -]on[ -]panel *(?:\}\})?(\ | \r|\n)</source> | $1{{technique|oil|panel}}$2 | |
Watercolor | \s*technique\s*=\s*)\{\{\s*de *\|\s*aquarell\s*\}\}(\ | \r|\n)</source> | $1{{technique|watercolor}}$2 | |
Fresco | \s*technique\s*=\s*)\{\{\s*de *\|\s*fresko\s*\}\}(\ | \r|\n)</source> | $1{{technique|fresco}}$2 |
{{Information}} fields
[edit]Task | Find | Replace | Notes | |
---|---|---|---|---|
"Description" cleanup | \s*description\s*=)\s*(?:\{\{\s*description missing\s*\}\}|\s*description missing\s*?|(?:\{\{\s*en *\|) *(?:)?no original description(?:)? *(?:\}\})|(?:)?no original description(?:)? *) *(\ | \r|\n)</source> | $1$2 | |
"Permission" cleanup 1 | \s*permission\s*=)\s*((?:\'\')?)(?:-|—|下記を参照|see(?: licens(?:e|ing|e +section))?(?: bell?ow)?|yes|oui)\s*?\,?\.?;?\s*?\2\s*?(\ | \r|\n)</source> | $1$3 | |
"Permission" cleanup 2 | \s*permission\s*=)\s*\{\{(?:en\|)?\s*?see\sbell?ow\s*?\}\}\s*?(\ | \r|\n)</source> | $1$2 | |
"Other versions" cleanup | \s*other[_ ]versions\s*=)\s*(?:)?(?:-|—|no|none?(?: known)?|nein|yes|keine|\-+)\.?(?:)? *(\ | \r|\n)</source> | $1$2 | |
"Source" cleanup | \s*source\s*\=\s*[^*]+?)\n?\*\s*uploaded\s+by\s+\[\[user\:[^\]]+]](\ | \r|\n)</source> | $1$2 | File Upload Bot (Magnus Manske) was adding these but they can already be found in the filehistory of each uploaded file. |
Dates
[edit]Most plausible years
[edit]Most digital photos are dated after 2000. So the most plausible year is <syntaxhighlight lang="text" enclose="none">(200[0-9]|201[0-9])</source>. For example 19082006 gets translated into 2006-08-19.
Task | Find | Replace | Notes | |
---|---|---|---|---|
Conversion (yyyy[ -/.]mm[ -/.]dd) | \s*date\s*=\s*)(?:created|made|taken)? *(200[0-9]|201[0-9])(-| |/|\.|)(0[1-9]|1[0-2])\3(1[3-9]|2[0-9]|3[01])(\ | \r|\n)</source> | $1$2-$4-$5$6 | |
Conversion (yyyy[ -/.]dd[ -/.]mm) | \s*date\s*=\s*)(?:created|made|taken)? *(200[0-9]|201[0-9])(-| |/|\.|)(1[3-9]|2[0-9]|3[01])\3(0[1-9]|1[0-2])(\ | \r|\n)</source> | $1$2-$5-$4$6 | |
Conversion (mm[ -/.]dd[ -/.]yyyy) | \s*date\s*=\s*)(?:created|made|taken)? *(0[1-9]|1[0-2])(-| |/|\.|)(1[3-9]|2[0-9]|3[01])\3(200[0-9]|201[0-9])(\ | \r|\n)</source> | $1$5-$2-$4$6 | |
Conversion (dd[ -/.]mm[ -/.]yyyy) | \s*date\s*=\s*)(?:created|made|taken)? *(1[3-9]|2[0-9]|3[01])(-| |/|\.|)(0[1-9]|1[0-2])\3(200[0-9]|201[0-9])(\ | \r|\n)</source> | $1$5-$4-$2$6 |
Other plausible years
[edit]Try those after applying the above! For example 19781706 gets translated into 1978-06-17.
Task | Find | Replace | Notes | |
---|---|---|---|---|
Conversion (yyyy[ -/.]mm[ -/.]dd) | \s*date\s*=\s*)(?:created|made|taken)? *(1[89][0-9]{2})(-| |/|\.|)(0[1-9]|1[0-2])\3(1[3-9]|2[0-9]|3[01])(\ | \r|\n)</source> | $1$2-$4-$5$6 | |
Conversion (yyyy[ -/.]dd[ -/.]mm) | \s*date\s*=\s*)(?:created|made|taken)? *(1[89][0-9]{2})(-| |/|\.|)(1[3-9]|2[0-9]|3[01])\3(0[1-9]|1[0-2])(\ | \r|\n)</source> | $1$2-$5-$4$6 | |
Conversion (mm[ -/.]dd[ -/.]yyyy) | \s*date\s*=\s*)(?:created|made|taken)? *(0[1-9]|1[0-2])(-| |/|\.|)(1[3-9]|2[0-9]|3[01])\3(1[89][0-9]{2})(\ | \r|\n)</source> | $1$5-$2-$4$6 | |
Conversion (dd[ -/.]mm[ -/.]yyyy) | \s*date\s*=\s*)(?:created|made|taken)? *(1[3-9]|2[0-9]|3[01])(-| |/|\.|)(0[1-9]|1[0-2])\3(1[89][0-9]{2})(\ | \r|\n)</source> | $1$5-$4-$2$6 |
Task | Find | Replace | Notes | |
---|---|---|---|---|
Conversion ({{date|yyyy|mm|dd}}) | \s*date\s*=\s*)(?:created|made|taken)? *\{\{\s*date\|([0-9]{4})\|(0[1-9]|1[012])\|(0?[1-9]|1[0-9]|2[0-9]|3[01])\}\}(\ | \r|\n)</source> | $1$2-$3-$4$5 | {{Date}} function is built-in |
Unknown date | \s*(?:date|year)\s*=\s*)(?:unknown?(?:\s*date)?|\?|unbekannte?s?(\s*datum)?)</source> | $1{{unknown|date}} | ||
{{other date|century}} | \s*(?:date|year)\s*=\s*)(\d\d?)(?:st|nd|rd|th) *century *(\ | \r|\n)</source> | $1{{other date|century|$2}}$3 | |
{{other date|~}} | \s*(?:date|year)\s*=\s*)(?:cir)?ca?\.? *\s?(1\d{2})[\-\?] *(\ | \r|\n)</source> | $1{{other date|~|${2}0|${2}9}}$3 | |
{{other date|~}} | \s*(?:date|year)\s*=\s*)(?:cir)?ca?\.? *(\d{4}) *(\ | \r|\n)</source> | $1{{other date|~|$2}}$3 | |
{{other date|?}} | \s*(?:date|year)\s*=\s*)(?:unknown|\?+)\.? *(\ | \r|\n)</source> | $1{{other date|?}}$2 | |
{{Original upload date}} |
\d{4}\-\d{2}\-\d{2}\}\})\s*(?:\(original\s*upload\s*date\)|\(\s*first\s*version\s*\);?\s*\{\{\s*original upload date\|\d{4}\-\d{2}\-\d{2}\}\}\s*\(\s*last\s*version\s*\))</source> | $1 | ||
{{Original upload date}} & {{According to EXIF data}} | \s*date\s*=\s*)(?:\{\{\s*date\|\s*(\d+)\s*\|\s*(\d+)\s*\|\s*(\d+)\s*\}\}|(\d{4})\-(\d{2})\-(\d{2}))\s*\(\s*(original upload date|according to EXIF data)\s*\)\s*?(\ | \r|\n)</source> | $1{{$8|$2$5-$3$6-$4$7}}$9 | |
{{Original upload date}} I | \s*date\s*=\s*)\{\{\s*date\s*\|\s*(\d+)\s*\|\s*(\d+)\s*\|\s*(\d+)\s*\}\}\s*\(\s*first\s*version\s*\)\;?\s*\{\{\s*date\s*\|\s*\d+\s*\|\s*\d+\s*\|\s*\d+\s*\}\}\s*\(\s*last\s*version\s*\)</source> | $1{{original upload date|$2-$3-$4}} | ||
{{Original upload date}} II | \s*date\s*=\s*)(\d{4})\-(\d{2})\-(\d{2})\s*\(\s*first\s*version\s*\)\;?\s*(\d{4})\-(\d{2})\-(\d{2})\s*\(\s*last\s*version\s*\)</source> | $1{{original upload date|$2-$3-$4}} | ||
{{Original upload date}} III | \s*date\s*=\s*\(?\s*)(?:Uploaded\s*on\s*Commons\s*at\s*[\d\-]*\s*[\d:]*\s*\(?UTC\)?\s*\/?\s*)?Original(?:ly)?\s*uploaded\s*at\s*([\d\-]*)\s*[\d:]*</source> | $1{{original upload date|$2}} | ||
{{other date|s}} | \s*date\s*=\s*)(\d{1,3}0)\s*s</source> | $1{{other date|s|$2}} | ||
{{other date|after}} | \s*date\s*=\s*)(?:after|post|بعد|desprès|po|nach|efter|μετά από|después de|pärast|پس از|après|despois do|לאחר|nakon|dopo il|по|na|após|după|после)\s*(\d{4})</source> | $1{{other date|after|$2}} | ||
{{other date|before}} | \s*date\s*=\s*)(?:before|vor|pre|до|vör|voor|prior to|ante|antes de|قبل|Преди|abans|před|før|πριν από|enne|پیش از|ennen|avant|antes do|לפני|prije|prima del|пред|przed|înainte de|ранее|pred|före)[\s\-]*(\d{4})</source> | $1{{other date|before|$2}} | ||
{{other date|or}} | \s*date\s*=\s*)(\d{4})\s*(?:or|أو|o|nebo|eller|oder|ή|ó|või|یا|tai|ou|או|vagy|または|или|അഥവാ|of|lub|ou|sau|или|ali|หรือ|和)\s*?(\d{4})</source> | $1{{other date|or|$2|$3}} | ||
{{other date|between}} | \s*date\s*=\s*)(?:sometime\s*)?(?:between)\s*(\d{4})\s*(?:and|\-)?\s*?(\d{4})</source> | $1{{other date|between|$2|$3}} | ||
{{other date|spring}} | \s*date\s*=\s*)(?:primavera(?:\s*de)?|jaro|forår|frühling|spring|printempo|Kevät|printemps|пролет|Vörjohr|früh[ \-]?jahr|voorjaar|wiosna|primăvara(?:\s*lui)?|весна|pomlad|våren|spring)\s*(\d{4})</source> | $1{{other date|spring|$2}} | ||
{{other date|summer}} | \s*date\s*=\s*)(?:estiu|léto|somero|verano|Kesä|été|verán|estate|лето|zomer|lato|verão(?:\s*de)?|vara(?:\s*lui)?|poletje|sommaren|sommer|summer)\s*(\d{4})</source> | $1{{other date|summer|$2}} | ||
{{other date|fall}} | \s*date\s*=\s*)(?:fall|autumn|tardor|podzim|Efterår|Herbst|aŭtuno|otoño|Syksy|outono(?:\s*de)?automne|outono|autunno|есен|Harvst|herfst|jesień|toamna(?:\s*lui)?|осень|jesen|hösten)\s*(\d{4})</source> | $1{{other date|fall|$2}} | ||
{{other date|winter}} | \s*date\s*=\s*)(?:winter|hivern|zima|Vinter|vintro|invierno|Talvi|hiver|inverno(?:\s*de)?|зима|iarna(?:\s*lui)?|зима|zima|vintern)\s*(\d{4})</source> | $1{{other date|winter|$2}} | ||
{{other date|circa}} | \s*date\s*=\s*)(?:[zc]ir[kc]a|ungefähr|about|around|vers|حوالي|cca|etwa|περ\.?|cerca\s*de|حدود|noin|cara a|oko|około|около|c[\:\. ]?a?[\:\. ]?)\s*(\d{3,4})(?:\s*\-\s*(?:[zc]ir[kc]a|ungefähr|about|around|vers|حوالي|cca|etwa|περ\.?|cerca\s*de|حدود|noin|cara a|oko|około|около|c[\:\. ]?a?[\:\. ]?)?\s*(\d{3,4}))?</source> | $1{{other date|circa|$2|$3}} | ||
empty argument fix | circa\|\d+)\|\}\}</source> | $1}} | ||
{{other date|circa}} | \s*date\s*=\s*)(?:[zc]ir[kc]a|ungefähr|about|around|vers|حوالي|cca|etwa|περ\.?|cerca\s*de|حدود|noin|cara a|oko|około|около|c[\:\. ]?a?[\:\. ]?)\s*(\d{3,4})</source> | $1{{other date|circa|$2}} | ||
(from metadata) | \s*date\s*=\s*)\{\{\s*ISOdate\s*\|\s*([\d\-]+)\s*\}\}\s*\(\s*from\s*metadata\s*\)</source> | $1{{according to EXIF|$2}} |
Junk cleanup
[edit]Task | Find | Replace | Notes | ||
---|---|---|---|---|---|
{{ImageUpload}} removal | <syntaxhighlight lang="text" enclose="none">\s*\n?</source> | [Minor] | |||
Uncategorized comment | <syntaxhighlight lang="text" enclose="none"> * *</source> | [Minor]; Usually left behind after categorizing | |||
"Categories" comment | <syntaxhighlight lang="text" enclose="none"> * *\n?</source> | [Minor] | |||
"move approved by" | \n)*?)(?:This image was moved from *\[\[:?(?:File|image):?[^\]\[{}]*\]\]\.?)?</source> | $1 | |||
Useless templates (if they take no parameters) | Art\.|bots|football[ _]+kit|template[ _]+other|s|tl|tlxs|template|template[ _]+link|temp|tls|tlx|tl1|tlp|tlsx|tlsp|mbox|tmbox(?:\/core)?|lan|jULIANDAY|file[ _]+title|nowrap|plural|time[ _]+ago|time[ _]+ago\/core|toolbar|red|green|sp|other date|max|max\/2|str[ _]+left|str[ _]+right|music|date|cite[ _]+book|citation\/core|citation\/make[ _]+link|citation\/identifier|citation|cite|cite[ _]+book|citation\/authors|citation\/make[ _]+link|cite[ _]+journal|cite[ _]+patent|cite[ _]+web|hide in print|only in print|parmPart|error|crediti|fontcolor|transclude|trim|navbox|navbar|section[ _]+link|yesno|center|unused|•|infobox\/row)\s*\}\}</source> | ||||
Useless full URL | \s*(?:https?:)?\/\/ticket\.wikimedia\.org\/otrs\/index\.pl\?Action\s*\=\s*AgentTicketZoom&(?:amp;)?TicketNumber\=(\d+)\s*\}\}</source> | {{PermissionOTRS|id=$1}} | |||
Unnecessary __NOTOC__ | <syntaxhighlight lang="text" enclose="none">__ *NOTOC *__</source> | [Case sensitive] [Minor]; Common.css prevents file pages from showing TOCs | |||
Remove empty lang templates | ab|ace|af|ak|als|am|an|ang|ar|arc|arz|as|ast|av|ay|az|ba|bar|bcl|be|bg|bh|bi|bjn|bm|bn|bo|bpy|br|bs|bug|bxr|ca|cbk-zam|cdo|ce|ceb|ch|cho|chr|chy|ckb|co|cr|crh|cs|csb|cu|cv|cy|da|de|diq|dsb|dv|dz|ee|el|eml|en|eo|es|et|eu|ext|fa|ff|fi|fiu-vro|fj|fo|fr|frp|frr|fur|fy|ga|gag|gan|gd|gl|glk|gn|got|gu|gv|ha|hak|haw|he|hi|hif|ho|hr|hsb|ht|hu|hy|hz|ia|id|ie|ig|ii|map-bms|ik|ilo|io|is|it|iu|ja|jbo|jv|ka|kaa|kab|kbd|kg|ki|kj|kk|kl|km|kn|ko|kr|krc|ks|ksh|ku|kv|kw|ky|la|lad|lb|lbe|lez|lg|li|lij|roa-rup|lmo|ln|lo|lt|ltg|lv|mdf|mg|mh|mhr|mi|mk|ml|mn|mo|mr|mrj|ms|mt|mus|mwl|my|myv|mzn|na|nah|nap|nds|nds-nl|ne|new|ng|nl|nn|no|nov|nrm|nso|nv|ny|oc|om|or|os|pa|pag|pam|pap|pcd|pdc|pfl|pi|pih|pl|pms|pnb|pnt|ps|pt|qu|rm|rmy|rn|ro|roa-tara|ru|rue|rw|sa|sah|sc|scn|sco|sd|se|sg|sh|si|sk|sl|sm|sn|so|sq|sr|srn|ss|st|stq|su|sv|sw|szl|ta|te|tet|tg|th|ti|tk|tn|to|zh-hans|tpi|tr|ts|tt|tum|tw|ty|tyv|udm|ug|uk|ur|uz|ve|vec|vep|vi|vls|vo|wa|war|wo|wuu|xal|xh|xmf|yi|yo|za|zea|zh|zh-hant|zh-hk|zh-min-nan|zh-sg|zu)\s*(?:|\ | \s*1=)?\s*\}\} *(\ | \r|\n)</source> | $1 | Ignores those followed by text (incorrect usage but still indicates the language) |
Remove void parameter (wrong syntax) | (\s*\ | \}\})</source> | $1$2 |
Links
[edit]Task | Find | Replace | Notes |
---|---|---|---|
External to interwiki (part 1) | (wikt)ionary|wiki(n)ews|wiki(b)ooks|wiki(q)uote|wiki(s)ource|wiki(v)ersity|wiki(voy)age)\.(?:com|net|org)/wiki/([^\]\[{|}\s"]*) +([^\n\]]+)\]</source> | [[$2$3$4$5$6$7$8:$1:$9|$10]] | Make sure not to touch credit lines which require a link to the file page. (Effectively a self-link which results in bold text after this regex) |
External to interwiki (part 2) | (incubator)|(quality))\.wikimedia\.(?:com|net|org)/wiki/([^\]\[{|}\s"]*) +([^\n\]]+)\]</source> | [[$1$2$3:$4|$5]] | See above |
External to wikilink (local) | net|org)/wiki/([^\]\[{|}\s"]*) +([^\n\]]+)\]</source> | [[:$1|$2]] | See above |
Interlanguage | sv|nl|de|fr|ru|it|es|ceb|vi|war|pl|ja|pt|zh|uk|ca|no|fa|fi|id|ar|cs|ko|ms|hu|ro|zh-yue|sr|tr|min|sh|kk|eo|eu|sk|da|lt|bg|he|hr|sl|hy|uz|et|vo|nn|gl|bat-smg|simple|hi|la|el|az|th|oc|ka|mk|be|new|tt|pms|tl|ta|te|cy|lv|ce|be-x-old|ht|ur|bs|sq|br|jv|mg|lb|mr|is|ml|pnb|ba|af|my|bn|ga|lmo|yo|fy|an|cv|tg|ky|nds-nl|sw|ne|io|gu|sco|bpy|scn|nds|ku|ast|qu|su|als|gd|kn|am|ckb|ia|nap|bug|wa|mn|pa|arz|mzn|si|zh-min-nan|yi|fo|sah|vec|sa|bar|nah|os|or|pam|hsb|se|li|mrj|mi|ilo|co|hif|bcl|gan|frr|bo|rue|mhr|glk|fiu-vro|ps|tk|pag|vls|gv|xmf|diq|km|kv|zea|csb|crh|hak|vep|sc|ay|dv|map-bms|so|nrm|rm|udm|koi|kw|ug|stq|bh|lad|wuu|lij|eml|fur|mt|szl|gn|pi|as|pcd|gag|cbk-zam|ksh|nov|ang|ie|nv|ace|ext|frp|mwl|ln|lez|sn|dsb|pfl|krc|haw|pdc|kab|xal|rw|myv|to|arc|kl|roa-tara|bjn|kbd|lo|ha|pap|av|tpi|mdf|lbe|jbo|na|wo|bxr|ty|srn|kaa|ig|nso|tet|kg|ab|ltg|roa-rup|zu|za|cdo|tyv|chy|tw|rmy|om|cu|tn|chr|bi|got|pih|sm|rn|bm|ss|mo|iu|sd|pnt|ki|xh|ts|zh-classical|ee|ak|ti|fj|lg|ks|ff|sg|ny|ve|cr|st|dz|ik|tum|ch|ng|ii|cho|mh|aa|kj|ho|mus|kr|hz):([^\]\[\|\}\{]+)\]\]</source> | [[:$1:$2]] | Interlanguage links in the File namespace do not make sense, categories should be used instead. Thus, convert to normal link and leave for manual cleanup. |
Categories
[edit]These are mainly to improve machine-readability when performing other category work.
Task | Find | Replace | Notes |
---|---|---|---|
Normalize categories | [^]]*)?\]\] *</source> | [[Category:$1$2]] | Run this before the other category fixes |
Remove empty [[Category:]] | <syntaxhighlight lang="text" enclose="none">\[\[category: *\]\](?:\n( *\[\[category:))?</source> | $1 | |
Remove double [[Category:[[Category:...]]]] | <syntaxhighlight lang="text" enclose="none">\[\[category:(\[\[category:[^]]*\]\])[ ]*\]\]</source> | $1 | |
One category per line | <syntaxhighlight lang="text" enclose="none">\[\[category:([^]]+)\]\] *\[\[category:([^]]+)\]\]</source> | [[Category:$1]]\n[[Category:$2]] | Run multiple times |
Remove duplicates | <syntaxhighlight lang="text" enclose="none">(\[\[[Cc]ategory:)([^]]+\]\])(.*?)\1\2\n?</source> | $1$2$3 | Run multiple times, case sensitive |
Remove blank lines between categories | <syntaxhighlight lang="text" enclose="none">(\[\[category:[^]]+\]\]\n)\n+(\[\[category:)</source> | $1$2 | [Minor] |
Formatting
[edit]Task | Find | Replace | Notes |
---|---|---|---|
Delete surplus lines | <syntaxhighlight lang="text" enclose="none">\n{3,}</source> | \n\n | [Minor] |
Fix incorrect line break syntax | <syntaxhighlight lang="text" enclose="none"></?br( )?(/)?\\?></source> | <br$1$2> | This fixes only incorrect syntax (so <br>, <br/>, and <br /> are preserved) |
Remove {{}}, [[]], <gallery></gallery>, etc. | \[\[\]\]|<gallery>\s*</gallery>|\[\[:?File *: *\]\])</source> |
See also
[edit]- User:Magog the Ogre/cleanup.js
- Commons:Tools/pywiki file description cleanup (outdated)
- For your convenience a snippet of XML that can be pasted into your AWB configuration file can be found here. (outdated)