Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Failing to extract DVB subtitles from live stream (Failed to perform OCR) #1010

Open
7 of 10 tasks
jakubvojacek opened this issue Oct 23, 2018 · 3 comments
Open
7 of 10 tasks
Labels
difficulty: medium DVB Issues related to extraction of DVB subtitles OCR

Comments

@jakubvojacek
Copy link

CCExtractor version (using the --version parameter preferably) : 0.87

  • I have read and understood the contributors guide.
  • I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present.
  • I have checked that the issue I'm posting isn't already reported.
  • I have checked that the issue I'm porting isn't already solved and no duplicates exist in closed issues and in opened issues
  • I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion.
  • I have used the latest available version of CCExtractor to verify this issue exists.

My familiarity with the project is as follows (check one, eg [X] - and delete unchecked ones):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

Necessary information

  • Is this a regression (did it work before)? [x] NO | [ ] YES - please specify the last known working version
  • What platform did you use? [ ] Windows - [x] Linux - [x] Mac
  • What were the used arguments? fails even with -udp 239.1.2.3:1234 (unrelated, but originally i was testing with -ocrlang por -quant 0 -datapid 0x451 -out=webvtt -noru -trim -lf -nots -nobom -s -nofc -nogt)

**Video links (replace text below with your links) **
tnt.ts - https://goo.gl/r4WXto

Additional information
Interestingly, when running ccextractor on the file (ccextractor tnt.ts), it does produce a tnt.srt file with correct subtitles in it. However, it does print a whole bunch of errors.

But when the tnt.ts is being played out in a loop (for example tsplay tnt.ts 239.1.2.3:1234 -loop), ccextractor fails eventually (the time before it fails varies in seconds to a minute usually)

root@jones:~/tnt# ccextractor   -udp 239.1.2.3:1234
CCExtractor 0.87, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: Network, 239.1.2.3:1234
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

----------------------------------------------------------------------
Reading from UDP socket 239.1.2.3:1234
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in pixConvertRGBToGray: pixs not defined
Error in pixGetDimensions: pix not defined
Error in pixGetColormap: pix not defined
Error in pixClone: pixs not defined
Error in pixGetDepth: pix not defined
Error in pixGetWpl: pix not defined
Error in pixGetYRes: pix not defined

TessBaseAPIRecognize returned -1, skipping this bitmap.
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in pixConvertRGBToGray: pixs not defined
Error in pixGetDimensions: pix not defined
Error in pixGetColormap: pix not defined
Error in pixClone: pixs not defined
Error in pixGetDepth: pix not defined
Error in pixGetWpl: pix not defined
Error in pixGetYRes: pix not defined

TessBaseAPIRecognize returned -1, skipping this bitmap.
TS continuity counter not incremented prev/curr 11/6
dvbsub_decode: incomplete, broken or empty packet, remaining bytes=3249, segment_length=3490
Return from dvbsub_decode: -1
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in pixConvertRGBToGray: pixs not defined
Error: In ocr_bitmap: Failed to perform OCR - Failed to get text. Please report.

Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues

Can you please look into what is wrong?

Thank you
Jakub

@cfsmp3
Copy link
Contributor

cfsmp3 commented Jan 25, 2020

@jakubvojacek Is this still a problem in current master?

@jakubvojacek
Copy link
Author

Hello @cfsmp3

I just tested with the current master (5f61fae) and it's still happening, it's reproducible on a static file now too. If you download https://goo.gl/r4WXto and try to play in VLC and enable Portugesse DVB subtitles, there will be subtitles visible. While trying with ccextractor (plain ccextractor tnt.ts), it will throw the same errors as described above. I have attached the console output below.

root@ts:/opt/ccextractor# git rev-parse HEAD
5f61fae0c7dacb05e2f42d5647aafc59d3cd2ef6

root@ts:/opt/ccextractor# build/ccextractor /data/tnt.ts
CCExtractor 0.88, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: /data/tnt.ts
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: No]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: /data/tnt.ts
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in pixConvertRGBToGray: pixs not defined
Error in pixGetDimensions: pix not defined
Error in pixGetColormap: pix not defined
Error in pixClone: pixs not defined
Error in pixGetDepth: pix not defined
Error in pixGetWpl: pix not defined
Error in pixGetYRes: pix not defined

TessBaseAPIRecognize returned -1, skipping this bitmap.
TS continuity counter not incremented prev/curr 10/14
dvbsub_decode: incomplete, broken or empty packet, remaining bytes=2917, segment_length=3462
Return from dvbsub_decode: -1
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in pixConvertRGBToGray: pixs not defined
Error: In ocr_bitmap: Failed to perform OCR - Failed to get text. Please report.

Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues

@PunitLodha PunitLodha added DVB Issues related to extraction of DVB subtitles OCR labels Jan 28, 2022
@jstrot
Copy link

jstrot commented Jul 7, 2024

I'm having a similar issue:

$ ccextractor --output-field 1 --cc2 --out=srt --utf8 movie.vob -o subtitle.srt
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: movie.vob
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: No]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: movie.vob
File seems to be a program stream, enabling PS mode
Analyzing data in general mode


New video information found
[720 * 480] [AR: 02 - 4:3] [FR: 04 - 29.97] [progressive: no]

Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in pixConvertRGBToGray: pixs not defined
Error: In ocr_bitmap: Failed to perform OCR - Failed to get text. Please report.

Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues

I'm not familiar with the exact content of the vob file I'm working with. Could be there is no actual CC encoded at all, could be corrupted too, but mediainfo seems to think there is a CC3 (hence my using --output-field 1 --cc2):

Text
ID                                       : 224 (0xE0)-CC3
Format                                   : EIA-608
Muxing mode, more info                   : Muxed in Video #1
Duration                                 : 2 min 58 s
Start time (commands)                    : 1 s 248 ms
Start time                               : 2 s 183 ms
Bit rate mode                            : Constant
Stream size                              : 0.00 Byte (0%)
Count of frames before first event       : 58
Type of the first event                  : PopOn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty: medium DVB Issues related to extraction of DVB subtitles OCR
Projects
None yet
Development

No branches or pull requests

4 participants