Module talk:TagQS
require('strict')
[edit]{{Edit protected}}
As per the new lua feature mentioned at m:Tech/News/2022/42, could require('Module:No globals')
be replaced with require('strict')
-- WOSlinker (talk) 17:21, 25 October 2022 (UTC)
tags are NOT invisible
[edit]Current they are generated using "div" elements to contain them. Even if they have a CSS style "display:none" (meaning that they won't be rendered in HTML, they are still visible to MediaWiki, that breaks existing paragraphs (or list items and in numbered or bulleted lists).
- Demo: "This is in English
". (Notice that there's NO line break in that list item starting at Demo, up to the end of this comment and there should be none, the result should be fully inline within the 1st list item; if you see theclosing double quote at start of this paragraph, this is not a typo, but a defect caused by QStags, that broke the quotation in the 1st list item; "bdi" is the only HTML element supporting that transparency in Mediawiki, not "span" elements which don't allow "mixed" contents for the value; and "bdi" is anyway shorter than "span": we want QStags to be as compact as possible in the generated HTML).
- This should be the 2nd item of the same uninterrupted list.
Please change "div" elements into "bdi" ; and DON'T insert any space or newline before or after opening/closing tags (to avoid their propagation/move outside the element by the "HTML Tidy" step of the Mediawiki conversion to HTML (there may be spaces, but only in the middle of the embedded text.
In summary the generated tags must be like:
<bdi style="display:none">label QS:Llang,"some value"</bdi>
or even better as one of:
<bdi data-label="QS:Llang,some value"></bdi>
<i data-label="QS:Llang,some value"></i>
<u data-label="QS:Llang,some value"></u>
<u data-label-QS="Llang,some value"></u>
where those last forms do not even needs any CSS style and are warrantied to be invisible and not even break words if they occur in the middle (because the value of the label is inside a "data" attribute that is always invisible in HTML, the element is preserved by MediaWiki as well because it has a valued attribute and always intended to be processed by a machine; the data attribute has no restriction on its value in HTML, except that it cannot contain newlines in Mediawiki and cannot contain double quotes that are not HTML-encoded) ! (note the presence of delimiting quotes, notably just after the end of the value and before the closing HTML tag, so that value may still contain whitespaces; it may eventually contain line breaks, but this should not occur for translated labels intended to be used inline and not as full paragraphs.
The interest of using "bdi" instead of other inline elements (like "i" or "u") is also the fact that is tags are used between block elements ("div") MediaWiki will not generate a paragraph to contain them (such insertion would generate visible empty paragraphs. So "bdi" is still the best (and only) choice in MediaWiki
You can see the effect of using an empty "u" element with a data attribute in the middle of the following word (yes there's one, you cannot detect where, even by copy-pasting from the HTML browser window to an external plain-text editor or input field!):
extremely_lengthy
But below the use and an "u" element between two blocks is still visible as an empty paragraph:
Blue Block
Green Block
This last quirk does not occur with "bdi" instead in standard HTML (but Mediawiki still inserts an undesired dummy paragraph, with a computed content height 0, but with top and bottom margins; note that the two colored blocks have zero margins, they should be touching each other; this did not occur in the past with a former Mediawiki parser).
Blue Block
Green Block
This is a minor bug in most cases: templates can be designed to include those template-generated QStags explicitly inside a "div" (but NEVER implicitly within this template):
Blue BlockGreen Block
Note that "data" attributes are standard in HTML and can be used on any HTML element. In addition they can have any suffix we want, after (an hyphen), that suffix just has to be a valid HTML identifier (so data attributes remain usable for efficient DOM searches by selectors).
In my opinion, "machine-readable" labels should not even use any HTML element, but may just generate HTML comments, if values contain any newline (However, the HTML Tidy step of MediaWiki will strip these HTML comments). Otherwise, those newlines should be escaped (e.g. as "\n", that a machine can interpret); in that case any backslash in the value should also be escaped (as "\\"), as well as tabs (as "\t"), or repeated spaces as ("\x20"), to avoid their compaction/trimming by HTML Tidy inside the embedded HTML text elements, independantly of CSS styles). Escaping should not be done using HTML character entities (these won't be differentiated by the final machine, but using a classic format using backslashes (that don't require any HTML entity, but this is transparent in the final HTML document), backslashes are easy to interpret by machines (the intended target of these QS tags). With such escaping of newlines (and some other whitespaces or controls we want to preserve), no more any problem to embed any text value within such data attributes of empty tags.
verdy_p (talk) 01:42, 10 December 2022 (UTC)
- @Lucas Werkmeister and Tacsipacsi: Can one of you provide a second opinion on this edit request? I an not very good at nuances of html, CSS, etc. so I am trying to avoid some unforeseen consequences of messing with them. Module:TagQS formalizes passing data between templates, like from {{Creator}} to {{Artwork}}, which is done in language independent machine readable way. It is done by wrapping QuickStatement inspired code in <div style="display: none;">....</div>. User:Verdy p is proposing alternative wrappers. I am OK with switching is it improves things. Do you guys have an opinion on proposed solution? --Jarekt (talk) 03:32, 19 June 2024 (UTC)
- The edit request sounds reasonable enough to me… my main question would be whether these tags are really only internal to Commons or whether anything else might use them as well and be broken by this change. Lucas Werkmeister (talk) 19:33, 23 June 2024 (UTC)
- Lucas Werkmeister As far as I know modules on Commons (that I wrote) are the only tools using them. However in the past I was wrong about similar assumption with License_template_tag template. So it is possible we will find other tools that use it. Unfortunately, I do not think there is any way to find out tools that might be using it ahead of time. --Jarekt (talk) 22:23, 4 July 2024 (UTC)
- @Jarekt: Can't we just search like: all: contentmodel:Scribunto insource:/readTag/? There are only ten exports from the module and only one is really designed to be used via
{{#invoke:TagQS|CreateTag}}
from wikitext. If we assume no one is attempting to generate or parse these manually (i.e., outside of Module:TagQS), we should be able to search for such fairly easily. —Uzume (talk) 15:56, 14 October 2024 (UTC)
- @Jarekt: Can't we just search like: all: contentmodel:Scribunto insource:/readTag/? There are only ten exports from the module and only one is really designed to be used via
- Lucas Werkmeister As far as I know modules on Commons (that I wrote) are the only tools using them. However in the past I was wrong about similar assumption with License_template_tag template. So it is possible we will find other tools that use it. Unfortunately, I do not think there is any way to find out tools that might be using it ahead of time. --Jarekt (talk) 22:23, 4 July 2024 (UTC)
- The edit request sounds reasonable enough to me… my main question would be whether these tags are really only internal to Commons or whether anything else might use them as well and be broken by this change. Lucas Werkmeister (talk) 19:33, 23 June 2024 (UTC)
- In my opinion, machine-readable tags (microtagging) should not use CSS. The "data-*" attributes are perfectly made for this, and standardized in HTML (and already used by MediaWiki and in many modules, including by the MediaWiki editors itself). Now there's the choice of the HTML element to use them. "div" is bad (as demonstrated above, it breaks lists and inline content). And "bdi" is perfect in all cases (this is the only HTML element supported by MediaWiki and HTML that can be used so universally, whereas "span" or other inline tags imply the additional implicit generation of a container paragraph, either in HTML or in MediaWiki, in places where they are undesired; so we don't really have the choice); that's why I asked to change microtags to use "bdi" element (by default instead of "div") and "data-*=" attributes (instead of CSS hacks, which also causes accessibility problems notably for Braille readers, and plain-text analyzers and indexers that still see the microtags as plain-text content, and the same occurs when copy-pasting plain-text from the rendered HTML, where the microtag appears "randomly" in the middle of that plain text). All justifications and examples are shown above with their effect. verdy_p (talk) 05:26, 19 June 2024 (UTC)
- I am not really sure why this module is generating hidden wikitext at all. Why not just remove the data after you read it so it is never in the final rendered wikitext anyway (it seems like we already have a
removeTag
)? Then it does not matter if it is hidden or not. A quick search tells me so farreadTag
andreadTags
are only ever referenced (besides here of course) in Module:Artwork and Module:Artwork/core and in each of the nine cases it appears to be doing so over a module or parent template argument. Nothing says we have to render any such argument as is. We can skim through them and just use the parts we want anyway we want to so why not just remove the data after we read and stash these values yielded by subtemplates (that callcreateTag
). If you really want to hide such, just in case, why not just pack the values into arguments to another template like {{void}} (or a parser function like#if:
) which is guaranteed to do nothing with its arguments (so it is never even seen by a web browser)? I recommend this module stops exportingremoveTag
,readTag
andreadTags
and instead make something like anextractTags
that returns the original input minus any tags plus a list of tags extracted from the input. Then callers can use such to filter out the tags while still being able to use the filtered out tag data. —Uzume (talk) 15:26, 14 October 2024 (UTC)- Uzume, that is a good idea and {{Artwork}}, {{Information}} and other infoboxes can be stripping those tags from all the input fields after reading them. I think I did not do it because I was thinking they were not visible any way and I used them for debugging. I am still working on changing the format so they are less visible, but this can be done more easily in addition to changing the format. The tags will still show up if the templates adding the tags are called not as fields to infoboxes but that should be a minority. --Jarekt (talk) 17:02, 14 October 2024 (UTC)
- Oh and packing them as arguments to {{void}} or a parser function like
#if:
, I think will make them disappear before I can read them in Module:Artwork. --Jarekt (talk) 17:06, 14 October 2024 (UTC)- @Jarekt: You might be right about that. The parser does delay expanding wikitext as long as possible but expansions do occur at
arg
access time in Scribunto modules (this is whypairs(args)
can be particularly expensive; I have often wished the frame provided a list of parameters in their given order allowing Scribunto to emulate any parser function but that requires MW core changes). Although, I am not so sure. Remember there is an order to the parser expansions. As{{#invoke:Artwork|function}}
is expanded/executed, each accessed parameter argument (parameters themselves, the part before=
is always expanded earlier) is expanded. Typically a wikitext expansion will expand until that piece of wikitext is completely known and can expand no further (in the case of{{#invoke:Artwork|function}}
that means as you access each argument its expansion is providing you the tags you want along with the other content you want there). However, that is not true if the expansion yields more unexpanded wikitext. Scribunto modules can easily generate such. These will have to be expanded in a later extra expansion call (and why you often see calls topreprocess()
near the finalreturn
in such code). The only case where that is not true is in substitutions (instead of transclusions) which happen during pre-save transforms (which generate page source) instead of during page render. Sadly, I am not sure this solves any of the problems here unless this module packs them in unexpanded wikitext and subsequently forces (another) expansion of that wikitext at access time. To me it seems overkill to consider using such a mechanism when we could use our own without depending on the wikitext parser to be a part of that (in the old days when we only had wikitext that it was not uncommon to use such things for "database" accesses, e.g., the old species taxonomy). —Uzume (talk) 20:26, 14 October 2024 (UTC)
- @Jarekt: You might be right about that. The parser does delay expanding wikitext as long as possible but expansions do occur at
- Oh and packing them as arguments to {{void}} or a parser function like
- Uzume, that is a good idea and {{Artwork}}, {{Information}} and other infoboxes can be stripping those tags from all the input fields after reading them. I think I did not do it because I was thinking they were not visible any way and I used them for debugging. I am still working on changing the format so they are less visible, but this can be done more easily in addition to changing the format. The tags will still show up if the templates adding the tags are called not as fields to infoboxes but that should be a minority. --Jarekt (talk) 17:02, 14 October 2024 (UTC)
- I am not really sure why this module is generating hidden wikitext at all. Why not just remove the data after you read it so it is never in the final rendered wikitext anyway (it seems like we already have a