共享資源:定時文本
Media community: Audio and video requests · Featured media (candidates) · Media help · Media of the Day · Timed Text · Video info · Video2commons–Upload · Video cut tool
考量到名稱相近或相同,你可能是在找Commons:文件字幕
定時文本(TimedText)是一个自定义的維基媒體共享資源的命名空间,用于容纳隱藏字幕、或翻譯字幕的文本,使其与其他媒体(例如音频或视频文档)相关联。本页旨在解释该功能的概念和使用方法。
隱藏字幕 (Closed captioning, CC)和翻譯字幕都是显示在电视、视频屏幕或其他视觉显示器上的文字的一種進程,以提供额外的或是作為說明的信息。 這两者通常都是作为某个演出,在它出現時的音频部分的转录(可能是逐字记录、也可能是以已编辑好的形式),有时甚至還會包括某些非语音元素的描述。 这有助于听障人士和失聰人士,并为非母语人士提供了理解多媒体文件内容的途径。
使用方法
也可参见Commons:Video#Subtitles and closed captioning。
有隱藏字幕的视频和音频片段的缩略图会显示CC图标。打开播放器后,您所使用语言的字幕会自动启用。您可以在播放器的控件中找到图标来切换语言、开关字幕或更改字幕格式。
定时文本可用于任何以时间顺序呈现的媒体:
- 音频文件
- 无声视频(默劇)
- 口语视频
- 演示一个概念或某物如何运作的动画
实际案例
- Commons:Timed Text Demo Page重点介绍几个定时文本示例的頁面。
- TimedText:Krazy Kat Bugologist 1916 silent.ogv.de.srt, 德语字幕
- TimedText:Krazy Kat Bugologist 1916 silent.ogv.en.srt, 英语字幕
- TimedText:Wikipedia_Edit2014.ogv.pl.srt重定向到TimedText:Wikipedia Edit 2014.webm.pl.srt
发掘
TimedText:
prefix, add the text after it, e.g. TimedText:Elephants_Dream.ogv
).TimedText:Elephants_Dream.ogv.en.srt
) to create a TimedText page. see Commons:Timed Text{{Allpages|102}}
會以 來呈現,並列出102命名空间中的所有页面。
共享資源需要一种为特定语言查找定时文本文件的方法;以下内容受限於搜索功能的限制(例如:未显示所有的匹配项;包含有不匹配项;需要有正则表达式的支持)。 也包括一些不同语言的定时文本 .srt 文件的搜索:
English • German • French • Portuguese • Russian • Swedish • Ukrainian • Polish • Indonesian
其他帮助用户查找定时文本的方法:
- {{Closed captions}} 显示某个文件的所有可用隱藏字幕文件的链接,可放在某个媒体页面及其討論页面上。
{{special|Prefixindex/TimedText:{{PAGENAME}}.|stripprefix|1|subtitles}}
生成一個指向所有相关定时文本文件的链接(Template:1)。- Commons:Timed Text/search by lang在某個已知的语言中、对共享資源、分类和討論页面有用的所有定时文本文件的页面,显示搜索的链接。
标记和查找需要翻譯字幕的视频
{{Captions requested}}模板可用于标记视频需要字幕。该模板會将其添加到Videos needing subtitles类别中,这样某個人就可以看到哪些视频、哪些用户或作者已要求提供视频的文字記錄。
本模板和类别属于Commons:WikiProject Deaf及其姐妹项目元維基:維基失聰人和Wikipedia:WikiProject Deaf的范围內。
查找需要翻譯字幕的视频
查找此类视频的方法之一是根据首选起始语言打开 Category:Files with closed captioning 中的一个子类别,然后使用Help:FastCCI(位于页面右上方),去包含没有想要的目标语言字幕的视频。
示例
- 要查找有英文字幕的视频以进行翻译,请访问Category:Files with closed captioning in English。
- 然后,单击FastCCI箭头打开子菜单,选择 "在此类别中但不在......"。
- 在文本框中,根据您首选的目标语言输入相应的类别:
- 德语请输入
Files with closed captioning in German
- 法语请输入
Files with closed captioning in French
- 俄语请输入
Files with closed captioning in Russian
- 德语请输入
等等..
定时文本的讨论
命名空间用于讨论各自的定时文本页面,但也可用于链接和分类定时文本页面。
維護工作
上载
要上传已创建的字幕文件,请在计算机上用文本编辑器(如Notepad )打开文件,然后将文本复制到定时文本的名称空间中的新页面,该页面应与视频的文件名和语言代码相匹配。
创建
Commons uses the SubRip (.srt) file format for closed captioning and subtitles. You can create these files in multiple ways.
Create subtitles page for existing Commons files
Option 1: in the Commons page of the file (recommended)
You can use the "TimedText" link at the top of any suitable multimedia file on Commons.
Option 2: directly in the media player
By using the CC button in the toolbar of the Wikimedia HTML5 media player, you can select subtitles if they are available, or open the Subtitles editor to create subtitles for the video.
Option 3: creating a blank page (for advanced users)
You can always directly create the page in Commons using the template TimedText:[Common_File_Name.extension].[language].srt, where [Common_File_Name.extension] is the name of the file, and [language] is the ISO code for the language.
Example: to add subtitles to Elephants_Dream.ogg
, you can create the page TimedText:Elephants_Dream.ogg.en.srt
for english subtitles, or TimedText:Elephants_Dream.ogg.fr.srt
for french subtitles.
Extracting existing subtitles to import them
Create Subtitles from DVD
To copy existing subtitles from a DVD you can use software such as SubRip. You can then copy-paste them in the wiki Commons subtitle page.
Create Subtitles with YouTube
YouTube allows users with a YouTube account to create subtitles out of any uploaded file. Keep in mind the speech recognition is automated and produces unexpected results. It is preferable to upload a transcript of the file to YouTube. This will provide a much better result. You can then copy-paste them in the wiki Commons subtitle page.
Steps to create the subtitles (a video tutorial of the steps can be found here):
- Upload the file. (The multimedia file must also include a video track but you are free to choose a blank one or any other)
- While uploading set the Video language for your file to the appropriate language under the "Show more" menu.
- Or, after uploading, select "Subtitles" in the specific videos Details or in the YouTube Studio navigation.
- Click on "Add" or "Add language".
- You can add subtitles in one of three ways:
- Upload a transcript in the proper format.
- Copy and paste the transcript.
- Type manually while watching the video.
- The captions are then integrated into the video.
- Download the .sbv file from the Subtitles menu under the three dot menu while in the "Edit Timings" view.
- Convert the contents of the .sbv file into .srt file. There are various online tools to help with this step.
- ffmpeg is one open-source option (directions).
- Upload the .srt file to the corresponding page of the video on Wikimedia Commons.
Downloading subtitles from YouTube
You can download subtitles from video on YouTube (and probably several other video websites) like so:
- Install yt-dlp
- Run
yt-dlp --list-subs url
(replace url with the youtube url) - Run e.g.
yt-dlp --write-subs en --sub-format vtt url
(replace url with the youtube url) - Maybe srt subtitles are available too so you should use that instead of vtt or you can download all at once
- Convert the vtt subtitles (or the format you have) to srt subtitles using a tool or web UI like this
- You can then paste these into the TimedText page of the video on WMC
If you use the tool video2commons one can check "Import subtitles" but that does not work for vtt subtitles (phab:T368298) so for these videos you also need to do the above steps for importing subtitles.
Machine transcription
You can use the open source tool SoniTranslate to more easily and quickly generate machine transcribed subtitles. It would be good if you check these, especially if you also use the tool for machine translation into other languages. For example it may output years as long texts instead of numbers or get people's names wrong. How to use this tool is described in Help:AI video dubbing. If there are no existing subtitles to import, this is likely the fastest way to add TimedTexts. Transcription usually only takes only a few seconds even if you don't have a GPU, depending on how long the video is.
The timings are made so that they are well-suited for getting used for dubbing videos into other languages which often is not the case for manually-made subtitles. You can edit the subtitles, then save as srt file and use that as input to the tool to let it create an audio or subtitle in another language.
Creating subtitles with whisper.cpp
As of 2024[update], the Whisper AI models are the most advanced speech transcription models available and can be run locally, either using Python or whisper.cpp. Unlike the earlier Vosk models, they will also produce punctuation, bringing their output much closer to a high-quality human transcription. All the same, you should check AI-generated subtitles against the video and correct mistakes, add punctuation, check correct spelling of people and place names, check facts and figures, etc. AI subtitles are very useful as a first draft, but often also contain some silly mistakes a human transcriber would not have made.
An advantage of whisper.cpp is that it is particularly optimized for running on the CPU rather than the GPU (so it is especially useful if you have an AMD graphics card and therefore no CUDA). But CUDA and Metal (on a Mac) are also supported, therefore it can easily adapt to different hardware configurations. Another advantage is that it does not require installing any external dependencies, i.e. no Python or PyTorch, since it is written in C++, making it a much smaller download than a Python machine learning environment.
Some video editing and closed captioning GUI software now features built-in Whisper functionality: Open source examples include the video editor Kdenlive (since version 23.04; requires Python) and Subtitle Edit (either Python or C++ can be used to run Whisper models).
But running the command-line version of whisper.cpp directly to create an SRT file is not too difficult either, provided your operating system has a C compiler, make, etc. to compile it with:
First, use e.g. ffmpeg to extract a video's audio track and convert it to 16 kHz sample rate:
ffmpeg -i some_video.ogv -ar 16000 -ac 1 -c:a pcm_s16le audio.wav
Next, compile whisper.cpp and download a model (the base model optimized for English content is about 140 MB; "medium" can also handle other languages and is about 1.5 GB) and then start the conversion with e.g.:
./main -m models/ggml-base.en.bin -f audio.wav -t 8 -pc -osrt
This will use 8 CPU cores and create an SRT file called audio.wav.srt in the same directory. During recognition, words will be color-coded by confidence (green = very certain, red = very uncertain), so you can quickly see if the model is having trouble. If a smaller model delivers unusable output, you can try a larger model, e.g. medium, which will be slower but produce better results.
You can also translate from other languages, e.g. adding "-l fr -tr" to the options will translate French audio to English.
Convert YouTube Subtitles to Timed Text format
SBV Subtitles
If you export the SBV format from YouTube subtitles you can use ffmpeg to convert the subtile file to the SRT (SubRip) format used by Commons. This feature also solves the overlap issue that is common when converting YouTube subtitles to Commons.
ffmpeg -fix_sub_duration -i input.sbv output.srt
XML Subtitles
This section describes how to convert XML YouTube subtitles to SubRip (srt) format, that is TimedText subtitles format used in Wikimedia Commons.
If
- the YouTube video has subtitles in some language (e.g. I created this YouTube video with subtitles in English, in Russian and in Livvi-Karelian languages),
- this video was uploaded to Wikimedia Commons (e.g. this file),
- you want to copy YouTube subtitles to the same video at Commons.
Then:
- Download the subtitle in XML, put the ID of the YouTube video at the end of the URL: http://video.google.com/timedtext?hl=en&lang=en&v=__youtube_video_ID__
- Install Ruby.
- Download a Ruby program to convert video subtitles from YouTube's XML format to the SubRip format.
- Run this program and convert XML file to .SRT file.
- Copy and paste the contents of the .SRT file into the corresponding page of the video on Wikimedia Commons.
General tips
Noise, etc.
Keep in mind to paraphrase unspoken sounds and surround them with rounded brackets, e.g.
1
00:00:20,000 --> 00:00:24,400
(engine sound)
Music
Music should be surrounded by the ♪ character, Unicode U+9834, or Alt+266A. You can also use ♫ Unicode U+9835, or Alt+266B, e.g.
1 00:00:20,000 --> 00:00:24,400 ♪ rock music playing in the jukebox ♪ ♫ she's singing ♫
Markup
The only supported markup of the SRT format is
- Bold – <b> ... </b>
- Italic – <i> ... </i>
- Underline – <u> ... </u>
REMINDER: Wikicode formatting is not supported.
Internationalization
After the subtitles have been transcribed in the original language of the video onto a Timed Text file, they can be translated into other languages as follows:
- Open the Timed Text file in the original language, say English for example TimedText:Elephants Dream.ogv.en.srt, in edit mode and copy the whole of the page.
- In the address bar replace "en" with the language code of your choice, say "fr", then paste the original text in the new page.
- View the original video, then translate the text into your language.
- After saving the new page, the video with the subtitles should load onto the page; you can view it to check the timing of the subtitles.
- Add a category link to the talk page [[Category:Timed Text in Language Name|Language Name]]. For example, see TimedText talk:Elephants Dream.ogv.fr.srt.
Wikipedia articles about the topics of Timed Text or subtitles
These are articles about either Q844253: Timed text, or Q204028: subtitle.
- Dansk: Undertekster
- Deutsch: Untertitelung
- Ελληνικά: Υπότιτλοι
- English: Timed Text is also termed subtitles, closed captioning and closed caption text. See also Subtitle (captioning).
- Esperanto: Subtekstoj
- Español: Subtítulo
- Français : sous-titrage
- Interlingua: Subtitulos
- Italiano: Sottotitolo
- 日本語: 字幕
- 한국어: 자막
- Македонски: Толкување
- Nederlands: Ondertiteling
- Norsk bokmål: Undertekster
- Português: Legenda
- Русский: Субтитры
- Slovenščina: podnaslovi
- Svenska: Textning
- Українська: Субтитри
- 粵語: 字幕
- 中文:字幕
- Bahasa Indonesia: Teks Berwaktu
Linking
This section needs expansion.
How to associate closed captions with multimedia files?
- Redirect to avoid duplicated content, for example TimedText:Elephants Dream (high quality).ogv.pt.srt redirects to the existing TimedText:Elephants Dream.ogv.pt.srt. This ensures the closed captions template displays the correct file name of the caption files (this could be important with movie clips).
- {{Closed captions}}'s parameter is an alternative
- more support is needed for the Timed Text function;
- Categorizing: Not possible to categorize the Timed Text page itself, but the Timed Text Talk can be.
A possible categorization scheme is:
[[:Category:File formats]] + [[:Category:Media types]] | [[:Category:Timed Text]] + [[:Category:Legend in German]] | [[:Category:Timed Text in German]] + [[:Category:Legend in French]] | [[:Category:Timed Text in French]] + [[:Category:Legend in English]] | [[:Category:Timed Text in English]]
Related categories: Category:Files with closed captioning
See also
- {{Captions requested}}
- SubRip
- Help:Namespaces lists all Commons namespaces
- Category:Video, base category for media about video
- Category:Videos, base category for video files
- National Captioning Institute (NCI).
- The W3C Timed Text homepage
- Captions For Deaf and Hard-of-Hearing Viewers, National Institute on Deafness and Other Communication Disorders (NIDCD).