this post was submitted on 27 May 2025
448 points (97.5% liked)

Technology

70461 readers
2545 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] SpaceScotsman@startrek.website 22 points 3 days ago (2 children)

I'm surprised VLC fares that badly with CCs encoded this way. Usually it's pretty good. I'm also now wondering if ffmpeg also shares the same problem

The top Youtube comment by Ridley Combs explains it pretty well:

FFmpeg maintainer here, and the details behind the caption decoding issues you're seeing in VLC are complex and horrific. They largely stem from how the EIA-608 caption format expects text to be laid out in a monospace grid onscreen, which isn't really how the text rendering stacks used for modern subtitling work (this is probably why changing the font caused problems on those Sony players); beyond that, the behavior can just end up pretty complex, and there's no convenient public-domain corpus of sample files for open-source software developers to test against. These kinds of issues also affect the Japanese (ARIB) and European (Teletext) formats to varying extents. These days, a lot of the focus ends up being on converting the text into modern Unicode text formats, styled using modern techniques, so direct rendering of the legacy formats hasn't had as much attention lately.

[–] LorIps@lemmy.world 3 points 3 days ago (2 children)

Because of the way those captions are stored VLC has to use OCR to convert the .SRT file (which basically stores low resolution b/w images I assume to easier allow for different alphabets) to normal text. I don't know why the open source solutions are so bad at this (especially considering how good the proprietary solutions seem to be) but I had similar problems ripping a DVD. I would assume that had he turned off the special font VLC uses for the subtitles and instead just seen the raw data there wouldn't have been a problem. Why VLC doesn't enable this by default (/ have this) I don't know.

[–] kaknife@lemmy.world 13 points 3 days ago

This is not about DVD subtitles, which are images as you say. This is about "Line 21" closed captioning. I.E. the text data that is embedded in an analog tv signal. There should be no OCR needed.

[–] GnuLinuxDude@lemmy.ml 6 points 3 days ago

There is no .srt in this case. This is also not about bitmap dvd vobsubs.