Commons talk:Project scope/Allowable file types
Policy template
[edit]I'm wondering if template Policy should be added to the page or wasn't added on purpose. -- Hamilton Abreu (talk) 12:58, 29 June 2011 (UTC)
- Guessing: Technical, administrative, and "political" are not identical. Disclaimer, I haven't tried to bypass the technical limitations, e.g., OGA with VP6 video for some ogg kind of FLV ;-) –Be..anyone (talk) 04:49, 22 January 2014 (UTC)
Disallowing ODT
[edit]Hello, if i'm raising this issue in the wrong place, please let me know where i should do that. I think that the idea of disabling Open Document on the basis that somebody can potentially be hidden in it is akin to disabling JPEG because there are steganographic ways to encode information into it. Most files can be made to have information in them which is disallowed by MediaWiki with no automatic way to prevent that. As i see it commons (as a community) should look onto the type of the file and see if it is useful (and free) and allow the use if it is; not take a container and think up ways that somebody can potentially sometimes create something outside of the scope in that container.
Take another example - Ogg, one could potentially embed copyrighted text in the Kate subtitle stream. And it would be very difficult to automatically detect that. In fact i think it will be significantly more difficult than automatically unzipping the ODT and considering every image an upload in its own right (with the number of people watching the uploads somebody will spot the misuse of the system). Beta M (talk) 04:44, 25 October 2011 (UTC)
- The traditional place is Commons_talk:File_types, but that's not very active. See also https://bugzilla.wikimedia.org/show_bug.cgi?id=2089 . Anyway, I think that there's been some general reluctance to allow file formats which do not have a specific concrete visual or audio realization... AnonMoos (talk) 05:53, 25 October 2011 (UTC)
- Problem is that ODT are editable documents, not real media, so that anyone can change its contents completely, so we are getting in the domain of the wikipedia. --Foroa (talk) 06:29, 25 October 2011 (UTC)
- Editability of ODT is, if anything, a plus. This is exactly why PDF are allowed, but are carefully watched, as they discourage the deriviation. In fact i can see the argument for completely banning PDF (it's proprietary container which even has specs to urge the software developers to disallow not only altering, but even copying from, the document). Beta M (talk) 06:57, 25 October 2011 (UTC)
- PDF has a concrete specific visual realization, and so has a leg up over ODT right there... AnonMoos (talk) 15:03, 25 October 2011 (UTC)
- Editability of ODT is, if anything, a plus. This is exactly why PDF are allowed, but are carefully watched, as they discourage the deriviation. In fact i can see the argument for completely banning PDF (it's proprietary container which even has specs to urge the software developers to disallow not only altering, but even copying from, the document). Beta M (talk) 06:57, 25 October 2011 (UTC)
- Problem is that ODT are editable documents, not real media, so that anyone can change its contents completely, so we are getting in the domain of the wikipedia. --Foroa (talk) 06:29, 25 October 2011 (UTC)
What i see right now is people looking for differences between ODT and formats which are allowed and saying "because of these differences we can't allow them", but are these differences really important? Is the fact that there is no "specific visual realization" (sic) enough to make a whole datatype banned? Keep in mind we are talking about a subsection of 'Project scope', and ask yourselves do you honestly want to say that nothing without specific visual representation could fall within the project scope of Wikimedia Commons? If that is what you are trying to say, then what is your logic for saying that? Audio has no specific visual realisation, and for MIDI even audio depends on the synthesizer which is used to play it (some synthesizers will do substitutions of instruments just like a word processor would substitute fonts; thus it's non-specific). Beta M (talk) 19:07, 25 October 2011 (UTC)
- I said "concrete visual or audio realization" in my first comment above. I think you also somewhat misunderstand Foroa's objection -- if something is basically textual in nature, and not fixed in one specific concrete visual realization (i.e. PDF or Djvu), then why isn't it on Wikipedia or Wikisource (as appropriate), where it can be edited using by anybody with a web-browser (even if they don't have an office suite installed)? AnonMoos (talk) 09:15, 26 October 2011 (UTC)
- It's a fair point about text-only documents, but can't really be applied to ods, odp, etc. Even odt is not really "basically textual in nature", just look up the difference between a text editor and a word processor. Let's consider for a second something like Free Software Magazine, for a while it was making its issues available for download in ODT format. They can't be imported directly into Wikipedia (original research) but could be placed here... except for the format. Of course, one could easily convert ODT into PDF, but that is like converting it into a single image, it makes copying from it possible, but not simple. So to answer the quest "why isn't it on Wikipedia" i can say Because it's out of scope there but is in scope here. Beta M (talk) 16:07, 26 October 2011 (UTC)
Well, I have this solution when uploading music scores which I think can be applied to other kind of documents:
- Single page music scores: I make the scorewriter (MuseScore or LilyPond/Frescobaldi) export to SVG, then I open them on LibreOffice Draw to crop/center the piece (when it's only one page), I optimize the SVG and then I add the steps to produce the image and, finally, I copy the contents of the MuseScore, MusicXML or LilyPond code (they are all text based, but a single LilyPond score can be made of multiple files) to the uploaded file's description page.
- Multiple page music scores: I export to PDF and copy the contents of the source document to the uploaded document's description page.
So if I ever need to upload a multipage PDF for whatever reason:
- Textual PDFs: I will make the text processor / presentation program (LibreOffice Writer / LibreOffice Impress or another compatible with ODF and Flat ODF) export to PDF, then I save the original document as Flat ODF (pure single text file instead of zipped collection of texts), then I copy the contents of the .fodt/.fodp file to the uploaded file's description page. I would also note which fonts were used.
- Textual PDFs containing images: I wouldn't embedd the images inside the original document (embedded images increase in size when encoded inside a text file instead of saved inside a zip file such as the ODF container), I would save the images on the same folder as the original document and would link the images inside the document, so I could upload each one alone (I could even link to the picture hosted on the Wikimedia Commons instead of my local filesystem), and then I would add the source document to the description page. I would also note which fonts were used and the addresses of the images hosted on Wikimedia Commons.
So the disallowing of the .odt files aren't so much of a nuisance after we learn we can easily get the source from Flat ODF (.fodt/.fodp) and save on the uploaded files' description page. Joaopaulo1511 (talk) 20:32, 5 August 2018 (UTC)
Handling the numerous requests for other file types
[edit]There is a long list of requested filetype support, from the list of "unsupported file types":
- Any format for 3D (COLLADA, X3D, .blend - bug discussion suggests X3D) − bugzilla:1790 (previous discussion)
- Any format for data (CSV, ODF ODB, etc.) - bugzilla:43151
- Raw image format (DNG) − bugzilla:19153
- Any format for HDRI (e.g. OpenEXR) − bugzilla:17505
- JPEG 2000 − bugzilla:11871 & bugzilla:18803
- KML − bugzilla:26059
- Chemical Markup Language − bugzilla:16491
- Protein Data Bank − see Extension:PDBHandler and pdbhandler.wmflabs.org
- Scribus − bugzilla:18845
- OpenDocument − bugzilla:2089
- EPUB − bugzilla:17858
- Opus − bugzilla:40193
- SWF — could be considered free as of 2009?
- Nonfree file formats
requested at least once, via automatic conversion of these formats to a free format on upload.
- Sound formats: MP3, WMA, RA - bugzilla:43149
- Video formats: MPEG, WMV, RM, FLV - bugzilla:43150
- Microsoft Office formats: DOC, XLS, PPT - bugzilla:43154
Most of the above issues are tracked as "Multimedia and file format support" issues in bugzilla:42725.
- This is a pity. Most of the above filetypes should be supported and allowed; even if their description pages come with an extra warning. Some of these are among the top file types used by anyone on the internet for that class of knowledge (EPUB, SWF, OD*). And the lack of any data format is unworthy of our scope and academic context. --SJ+ 22:57, 20 June 2013 (UTC)
- This site is really mainly for visual, audio, and audio-visual media files. Abstract data which is not fixed into a particular concrete visual, audio, or audio-visual form falls outside the basic scope of this site. That's part of why none of the word-processing or spreadsheet formats is supported... AnonMoos (talk) 04:46, 21 June 2013 (UTC)
- The article doesn't cover Matroska (
.mkv
,.mka
, or.mks
) at all, this is mildly confusing in presence of WebM. Or maybe it's only me. –Be..anyone (talk) 04:59, 22 January 2014 (UTC)
- The article doesn't cover Matroska (
- What is the advantage of matroska? --McZusatz (talk) 15:20, 22 January 2014 (UTC)
- I'm not sure about advantages, I just miss it in the list. WebM uses this container, and it's FOSS. Maybe .mka could permit ALAC audio for the purposes of commons, maybe .mks subtitles are great, I can't judge it. Otherwise we should add it to the unsupported list. –Be..anyone (talk) 16:04, 22 January 2014 (UTC)
- What is the advantage of matroska? --McZusatz (talk) 15:20, 22 January 2014 (UTC)
Currently disabled
[edit]Should we add EPUB here? It's the same "based on ZIP" problem. And could somebody who has read RFC 5334 please check my DEnglish .oga, .ogg., .ogv, .ogx blurb? –Be..anyone (talk) 04:40, 22 January 2014 (UTC)
Meanwhile I figured out that FLAC outside of .ogx exists and even works for me, unlike .oga. I think (might be wrong) that this SHOULD
be RFC 5334 .ogx
instead of RFC 3534 .ogg
. –Be..anyone (talk) 16:18, 22 January 2014 (UTC)
Multipage files
[edit]Are multipage TIFFs supported? 31.131.194.249 15:55, 5 July 2022 (UTC)