video encoding&cleaning

Subtitle generation with Subtitle Edit and Whisper

Images to Text

DVB transmissions often have graphic subtitles that many video playing devices cannot decode. Subtitle Edit can convert them to text (*.srt) files with various options of OCR processors, automatically and almost error free. Just record DVB including subtitles and let Subtitle Edit do its job, then multiplex the .srt files into your video file using e.g. MKVtoolNix. Note that Subtitle Edit has millions more features, just discover!

Speech to Text

Subtitle Edit also offers an audio to subtile conversion based on Vosk, and also on Whisper.
Vosk is pretty fast (about 4x realtime on a good I5 CPU) but is not perfect, good results only with a clearly spoken comment.
Whisper, based on openAI, is a lot slower on the CPU, but it also runs on a GPU. With a decent GPU, like a GTX980 e.g., it can achieve up to 20x realtime speed, using the "small" language model.

Whisper in Subtitle Edit 

(status September 2023; this refers to Subtitle edit 4.01)

As mentioned above, Whisper can be used with Subtitle Edit. But the only option really working with the GPU so far appears to be Const-me:

Most astonishing, here even the large model works on a 4GB GPU, and the conversion is very fast, 20 times real speed on a GTX980 with the small and still 4x with the large model. All models even run on a comparably tiny GTX730 GPU, the small model at about 2x real speed. 

Subtitle Edit automatically corrects for Whisper timing errors. The latter option is not yet available separately in Subtitle Edit's batch processor (yet it doesn't currently correct for text sometimes appearing much too early and then for very long).

There also is a necessary tweak for reliability:
The latest beta of Subtitle Edit has an "Advanced" key to  enter additional parameters to const-me, which is badly needed:

Enter --max-context 1 here.
Without it, many conversions may often go off rail, suddenly repeating one subtitle line for several minutes and subsequently failing to deliver good subtitles. You may experiment with values above 1, but I always had some glitches with that. I guess it's the same issue that requires 
--condition_on_previous_text False with the Python/GIT version of Whisper.

You may test const-me also via command prompt (see here how to get it), which displays text generated in real time, and also lists all parameters available. To do this, open a command prompt in C:\Users\(your user name)\AppData\Roaming\Subtitle Edit\Whisper\Const-me.

The quality gain for the medium and large models vs. the small one is not always obvious, but the large model knows a lot more things, which eliminates many spelling errors and sometimes produces results so good that it's almost uncanny. Sometimes it's getting weird, though, e.g., if a line saying "Copyright xxx" appears where there is not the slightest spoken text like that or any connection to xxx whatsoever... . 

Maybe the larger model is better for rare languages, I would guess it is.

Post processing Whisper's srt files:

You may/should process srt files generated by Whisper alone, for two crucial purposes:
Splitting long lines (Whisper makes many very long ones), and granting the right character encoding by saving the files as UTF8 with BOM. This step is not necessary if we generated srt files from within Subtitle Edit, with the right options.

Subtitle Edit's batch function serves for this.The following images show some options to use for better line splitting, and some options useful for the batch processing.

srt split options

srt batch options

Note: Auto balance lines may fail quite often, so for reliability, maybe better leave it out.

You may also want to have the subtitles displayed in the preview window overlayed to the video, like when playing them back on TV. For this, just download mpv lib and then set your font size::

Showing subs overlayed in video window: download mpv lib

.Joining the subtitles with the videos

At last, we want to merge the subtitles with the videos.
Manually, this is done with MKVtoolNix. It's quite self explaining so I won't provide more to it here.
But you may also want to use the batch processing tricks described here, and in this case you can use the mkvmerge.exe program that comes with MKVtoolNix and is found in it's program folder. Run it  within a command window and you'll get its help text describing the parameters available.
I recommend adding a path entry for MVtoolNix' program folder so you can use the tools from anywhere in a command window.

How to get a Command prompt:

Just open a folder , type "cmd" into its address line, and press Enter.
command prompt
Now you are in a command prompt window in the same folder.

Up to Windows 9, a command prompt could also be obtained by pressing Shift and right-clicking on a folder, then selecting 'open command prompt here'. In Windows 10/11, only Powershell is offered. There, an equivalent of a command prompt can be obtained in by selecting 'open Powershell window', and within the Powershell window, entering "cmd".

A genuine command prompt can also be opened from the Start Menu, ..Windows, ..System. (Right- click for options, such as as Run as Administrator). Then navigate to specific folders by cd <directory_name>.  Entering cd.. gets you one level up.

A more convenient way may be to install OpenCommandPromptHere from 4dots-Software, letting you choose if you want the prompt in normal or admin mode, or  FileMenuTools from Lopesoft, the latter one coming with lots more right-click tools.

If you want to build yourself a genuine command window option for any directory in windows 10/11 like it was in Windows 9, see here and here.
Or, in short (beware, only try this if you know what you're doing!):
Now you can get the command prompt here option by right-clicking on a folder or on the background of an opened folder.

Preparing cmd batches:

Sometimes it may be convenient to do batch operations in the command window. 
If you are subtitling many files at once, it
may especially be useful to merge subtitles into large numbers of video files automatically.
Here are some tricks for batch subtitle generation:

Why all that window copy/pasting? It avoids problems with special characters in file names.
These techniques also allow for making batch files processing through entire directory trees.

That's about all. The following paragraphs are left here for back reference only:

Whisper with Python and GIT  (not recommended anymore)

Whisper can be used as a stand-alone app with Python and GIT. Yet this is my recommendation anymore, as Const-me that comes with Subtitle Edit works much faster and needs no cumbersome installetion. Yet if you want to waste some time, do this:

Now you may test, in a command window, supposed you have a file named test.wav and you've opened a command window and navigated to the same directory (see hint on getting a window with a  command prompt below):

This will download the small language model, then generate text and subtitle files for test.mkv or whatever file you specified.

Some texts are difficult enough to get whisper in error loops; you will notice this when some output lines are repeated many times instead of generating the subsequent text. In this case, the option

will help. It may also speed up the process in these cases, and it reduces memory usage (e.g. only 2.4 GB of GPU memory instead of 3).
It's however not always better, may sometimes result in phrases from one line being repeated in the next one.

Let's now explore the specific requirements and results for CPU and GPU processing.

Whisper packet sizes, speed, memory usage
This is comparing an i5 CPU@4x4GHz an a GTX980 graphics card.

model  file size MB RAM GB RAM GB RAM GB RAM GB speed factor speed factor

CPU CPU c.o.p.t.  GPU GPU c.o.p.t. CPU GPU
small  472 1.8 1.8 2.5 3.4 0.45 5
medium 1492 4,7 4,5 6,5? 8,5? 0,18 2?
large 3015 8,8 9,4 12? 18? 0.1 1?

c.o.p.t. means --condition_on_previous_text True (the default setting) 

Note that in addition to the above numbers, approx. 3 GB extra CPU memory may be necessary with Windows10, provided no other memory intensive apps are running.

With GPU, memory is absolutely critical, the process stops if even only a bit is missing. So most currently installed graphics cards may not be able to run the large models. 

Note that the Const-me option in Subtitle edit is not only 4 times faster, but also allows to use even the large model efficiently with only 4 GB graphics RAM!

Copyright (C) 2023; all rights reserved. All materials in these pages are presented for scientific evaluation of video technologies only. They may not be copied from here and used for entertainment or commercial activities of any kind.
We do not have any relation to and do not take any responsibility for any software and links mentioned on this site. This website does not contain any illegal software for download. If we, at all, take up any 3rd party software here, it's with the explicit permission of the author(s) and regarding all possible licensing and copyright issues, as to our best knowledge. All external download links go to the legal providers of the software concerned, as to our best knowledge.
Any trademarks mentioned here are the property of their owners. To our knowledge no trademark or patent infringement exists in these documents; any such infringement would be purely unintentional.
If you have any questions or objections about materials posted here, please
e-mail us immediately.
You may use the information presented herein at your own risk and responsibility only. We do also not guarantee the correctness of any information on this site or others and do not encourage or recommend any use of it.
One further remark: These pages are covering only some aspects of PC video and are not intended to be a complete overview or an introduction for beginners.