Skip to content

Teletext

Harry Munday edited this page Sep 9, 2024 · 6 revisions

Teletext

Previous Page Identifying VBI data

Next Page Decoding RF Captures

BBC Testcard F 1994 PAL VHS sample with teletext in the VBI space.

What Is & Was Teletext?

Starting in the 1970's and ending in 2009~2015 it was an information system widely used in the United Kingdom & Mainland Europe, with some very limited use in North America.

(Still in use in some European/Eastern nations)

Teletext data packets were embedded into the top VBI picture area on live broadcasts, this was used daily for the news, weather, BBS boards, being highly regraded for same day stocks and sports information at a users fingertips in the analogue era there was even interactive games.

Teletext Decoding

Thanks to Ali1234 for the R&D into this.

Teletext decoding has NVidia CUDA/OpenGL acceleration available alongside CPU if no GPU is available.

Decode "Deconvolve" Teletext Data With

teletext deconvolve --keep-empty -c ddd-tbc INPUT.tbc

View Teletext Data With

teletext vbiview -c ddd-tbc INPUT.tbc

So what you want is the --keep-empty mode when deconvolving. It will insert empty packets in the t42 when there's no line identified, thus the output t42 hs the same number of "lines" as the input tbc, and you can go from a t42 packet back to the original. - Ali123 (2024)

This will eventually have a proper GUI viwer for extracted data.

Generating Teletext

Raspberry Pi Teletext

Raspberry Pi based Closed Captions & Teletext generator tool that uses the composite output, this can be used with a TRRS 3.5mm breakout jack with current pi boards or older pi models with a dedicated yellow composite RCA jack.

HackTV / Windows & GUI

The HackTV application can generate CVBS or modulated TV signals to file or pipe them to SDRs such as the FL2K & HackRF units.

Teletext Viewer Tools

Zapping - Linux

Documents

Teletext Archive Proposal

Source Archived On 08.09.2024

Recovery Workflow

Capture

Capture full tapes to a single file, then import into the database.

Splits

Use the splits tool (to be written) to mark sections of the capture containing a contiguous signal. Enter metadata and commit to the database.

Deconvolution

Once the splits are created, deconvolution can be performed. This is very resource intensive and should not be performed on the server. The server should instead bundle up the work and serve it to clients for distributed processing.

Raw Subpages

Once the full split has been deconvolved it can be parsed into raw subpages. These are just lists of packets, unmodified. This is a fast operation and can be done on the server.

Squashing

Squashing means finding all instances of the same subpage in a split, and comparing them to find the most common character in each location. This reduces error. This operation is not very CPU intensive, but it would be easy to distribute it. The resulting subpage is a child of all subpages used to make it.

Manual Restoration

Finally the squashed subpages can be manually fixed using a web editor or other tool that can communicate with the API. Whenever a page is modified this way, the old revision should be kept.

Back Propagation

Once all the pages have been restored, the fixed lines can be back-propagated to the original splits they came from to produce a clean packet stream.

Using the Archive

Browsing

Users should be presented with a timeline and channel selector. This should indicate availability of pages. Upon selection they are presented with the main index page for that service. Navigation can then be performed via hyperlinks in the HTML rendered pages, or by entering page numbers.

Searching

Users should be able to search by channel, date, keyword, and quality. Results should show some (three?) lines from the page, rendered as teletext. Only pages above a certain level of quality should be indexed, and results should favour the best quality pages.

Rendering

Pages should be available as HTML or images. Whenever the user is presented with a page, they should be able to see the full provenance of that page - who captured it, what recovery steps have been done, and the original VBI lines (or at least how to get them).

API

The API should cover all functions needed to implement the recovery workflow, as well as search and retrieval functions. Pages should be available as HTML, image, or packet data. A streaming API should also be available, allowing software inserters to request a packet stream for a given date and channel. This stream should be generated using back-propagation from the recovered subpages to the original split lines. If no pages are available, this should return an error page in teletext format.

Database Schema

Channel

Records metadata about TV channels.

  • Full Name
  • Call sign (eg BBC1)
  • Company/group
  • Country
  • Language (as locale code eg en_UK, no_NN etc)
  • TS 101 231 metadata (may be multiple fields)

Region

A local version of a channel.

  • Channel
  • Region
  • Others?

Source

A source is a raw VBI capture. It is a sequence of samples organized into lines of a fixed length. It may contain separate recordings from different dates and channels. This corresponds to any raw capture file on disk with unknown contents.

  • GUID
  • Owner (Person)
  • Source Format (VHS, Beta, LD, Live etc)
  • Recording Device (If known)
  • Playback Device (If known)
  • Capture Device (BT8x8, DdD etc)
  • Sample Format (eg unsigned 8 bit)
  • Line length (samples per line)
  • Field lines (lines per field)
  • Field range (which lines in each field may contain teletext)

Split

A split is a subsection of a source which contains a single contiguous teletext stream, for example an uninterrupted recording of a whole episode of a TV show.

  • Source
  • Start Line
  • End Line
  • Start TimeDate (Approximate)
  • End TimeDate (Approximate. Can be calculated)
  • Channel
  • Region (if known)

Deconvolution

A deconvolution is the result from deconvolving a split. It is a teletext packet stream. Ideally it should be possible to link each line in the deconvolution back to the raw VBI samples it came from.

  • Split
  • Person (Who deconvolved it)
  • Software/version used
  • Subjective Quality (as decided by the uploader and/or others)
  • Objective Quality (as decided by some algorithm yet to be written)

Subpage

A reconstructed subpage. Should record the method used for restoration, and all lines used to compile. If it is a revision of a previous subpage, reference to that subpage.

  • Page Number
  • Subpage Number
  • TimeDate (Approximate. Calculated from earliest header?)
  • Packets
  • Raw (has it been modified from the deconvolution at all?)
  • Spellchecked (has it been passed through an automated spelling checker?)
  • Manual restoration (has it been manually fixed by a human?)
  • Subjective Quality (as decided by the uploader and/or other viewers)
  • Objective Quality (as decided by some algorithm yet to be written)
  • Parents (if this is squashed or manually restored), or:
  • Source Deconvolution
  • Source Lines (the lines from the Split used to generate it)

Storage Requirements

Raw VBI Data

Raw VBI data is fixed size depending on the length of the capture. Currently the BT8x8 captures are the largest size with 2048 8-bit samples per line. Raw Domesday Duplicator output is larger but contains the whole picture which we do not need.

  • 1 hour: 5.75 GB
  • A four hour tape: 23 GB
  • 9000 four hour tapes: 207 TB
  • 1 hour per day for 40 years and 5 channels: 415 TB
  • All teletext ever broadcast on UK terrestrial analog: 10 PB

Deconvolved Packets

Each packet is 42 bytes without the fixed CRI, one packet per line. It is closely related to the length of the original capture, but lines which do not contain teletext can be discarded. Numbers below assume all lines are used.

  • 1 hour: 120 MB
  • A four hour tape: 480 MB
  • 9000 four hour tapes: 4.32 TB
  • 1 hour per day for 40 years and 5 channels: 9 TB
  • All teletext ever broadcast on UK terrestrial analog: 210 TB

Services

A single subpage is approximately 27 packets, or about 1kB. A service is a set of subpages broadcast in a looping sequence. It is difficult to quantify an exact size as the number of subpages varies greatly between channels and years. However a typical service seems to be about 2 MB.

  • 9000 four hour tapes, assuming half-hour recordings: 72000 services, 144 GB
  • 1 service per day for 40 years and 5 channels: 73000 services, 146 GB

Compression

Raw VBI samples can be compressed using a variety of algorithms. Compression used should be lossless. There are broadly three types of algorithms to consider: data, video, and audio. Each has pros and cons.

  • General data compression algorithms are always lossless, but they provide the worst compression ratios and are often extremely slow. They can’t store metadata about the samples.
  • Video codecs have the advantage of being able to encode line and frame sizes, and are designed with very large files in mind. They provide compression somewhat better than data algorithms, but are not optimal because VBI samples have no vertical or temporal correlation. Compression is often very slow.
  • Audio codecs give by far the best compression ratio and the fastest encodes but are usually not designed to store such large amounts of data. As a result of this, seeking can be slow.

Note that deconvolved packets are much smaller and compress well with data algorithms like gzip due to their primarily textual content.

Algorithm Type Size Encode Time Command
raw raw 25GB 4 hours Uncompressed - encode time is length.
gzip data 16GB 2 hours gzip -9 sample.vbi
xz data 15GB 8 hours xz -9 sample.vbi
flac audio 10GB 15 minutes flac --best --sample-rate=48000 --sign=unsigned --channels=1 --endian=little --bps=8 --blocksize=65535 --lax -f sample.vbi
x264 video 13GB 2 hours ffmpeg -f rawvideo -pix_fmt y8 -s:v 2048x32 -r 25 -i sample.vbi -c:v libx264 -crf 0 -preset slow sample-h264.mkv
x265 video 18GB 3 hours ffmpeg -f rawvideo -pix_fmt y8 -s:v 2048x32 -r 25 -i sample.vbi -c:v libx264 -x264-params "profile=monochrome8:lossless=1:preset=slow" sample-h265.mkv
ffv1 video 13GB 1 hour ffmpeg -f rawvideo -pix_fmt y8 -s:v 2048x32 -r 25 -i sample.vbi -c:v ffv1 -level 3 sample-ffv1.mkv
ffv1 video 12GB 16 minutes ffmpeg -f rawvideo -pix_fmt y8 -s:v 2048x32 -r 25 -i 0001.vbi -c:v ffv1 -level 3 -coder 1 -threads 8 -context 1 -slices 4 sample-ffv1.l3.c1.large.mkv
aec pcm 14GB 18 minutes aec -n 8 sample.vbi sample.sz

Challenges

Location of Raw VBI Samples

These are very large. Will the user upload them to the archive, or just upload deconvolved data?

Maintaining Provenance of Results

Each deconvolved line needs to be mapped back to the original samples. There are a few ways to do this:

  • Make the deconvolved output have the same number of lines as the input. This means adding filler for VBI lines that don’t contain teletext.
  • Store line numbers in the deconvolved data. This makes it harder to parse.
  • Store the mapping in a separate file.
  • Store the mapping directly in the database.

Page End

Previous Page Identifying VBI data

Next Page Decoding RF Captures

Clone this wiki locally