Teletext

BBC Testcard F 1994 PAL VHS sample with teletext in the VBI space.

What Is & Was Teletext?

Starting in the 1970's and ending in 2009~2015 it was an information system widely used in the United Kingdom & Mainland Europe, with some very limited use in North America.

(Still in use in some European/Eastern nations)

Teletext data packets were embedded into the top VBI picture area on live broadcasts, this was used daily for the news, weather, BBS boards, being highly regraded for same day stocks and sports information at a users fingertips in the analogue era there was even interactive games.

Teletext Decoding

Thanks to Ali1234 for the R&D into this.

Teletext decoding has NVidia CUDA/OpenGL acceleration available alongside CPU if no GPU is available.

Decode "Deconvolve" Teletext Data With

teletext deconvolve --keep-empty -c ddd-tbc INPUT.tbc

View Teletext Data With

teletext vbiview -c ddd-tbc INPUT.tbc

So what you want is the --keep-empty mode when deconvolving. It will insert empty packets in the t42 when there's no line identified, thus the output t42 hs the same number of "lines" as the input tbc, and you can go from a t42 packet back to the original. - Ali123 (2024)

This will eventually have a proper GUI viwer for extracted data.

Generating Teletext

Raspberry Pi Teletext

Raspberry Pi based Closed Captions & Teletext generator tool that uses the composite output, this can be used with a TRRS 3.5mm breakout jack with current pi boards or older pi models with a dedicated yellow composite RCA jack.

HackTV / Windows & GUI

The HackTV application can generate CVBS or modulated TV signals to file or pipe them to SDRs such as the FL2K & HackRF units.

Teletext Viewer Tools

Zapping - Linux

Documents

Teletext Archive Proposal

Source Archived On 08.09.2024

Recovery Workflow

Capture

Capture full tapes to a single file, then import into the database.

Splits

Use the splits tool (to be written) to mark sections of the capture containing a contiguous signal. Enter metadata and commit to the database.

Deconvolution

Once the splits are created, deconvolution can be performed. This is very resource intensive and should not be performed on the server. The server should instead bundle up the work and serve it to clients for distributed processing.

Raw Subpages

Once the full split has been deconvolved it can be parsed into raw subpages. These are just lists of packets, unmodified. This is a fast operation and can be done on the server.

Squashing

Squashing means finding all instances of the same subpage in a split, and comparing them to find the most common character in each location. This reduces error. This operation is not very CPU intensive, but it would be easy to distribute it. The resulting subpage is a child of all subpages used to make it.

Manual Restoration

Finally the squashed subpages can be manually fixed using a web editor or other tool that can communicate with the API. Whenever a page is modified this way, the old revision should be kept.

Back Propagation

Once all the pages have been restored, the fixed lines can be back-propagated to the original splits they came from to produce a clean packet stream.

Using the Archive

Browsing

Users should be presented with a timeline and channel selector. This should indicate availability of pages. Upon selection they are presented with the main index page for that service. Navigation can then be performed via hyperlinks in the HTML rendered pages, or by entering page numbers.

Searching

Users should be able to search by channel, date, keyword, and quality. Results should show some (three?) lines from the page, rendered as teletext. Only pages above a certain level of quality should be indexed, and results should favour the best quality pages.

Rendering

Pages should be available as HTML or images. Whenever the user is presented with a page, they should be able to see the full provenance of that page - who captured it, what recovery steps have been done, and the original VBI lines (or at least how to get them).

API

The API should cover all functions needed to implement the recovery workflow, as well as search and retrieval functions. Pages should be available as HTML, image, or packet data. A streaming API should also be available, allowing software inserters to request a packet stream for a given date and channel. This stream should be generated using back-propagation from the recovered subpages to the original split lines. If no pages are available, this should return an error page in teletext format.

Database Schema

Channel

Records metadata about TV channels.

Full Name
Call sign (eg BBC1)
Company/group
Country
Language (as locale code eg en_UK, no_NN etc)
TS 101 231 metadata (may be multiple fields)

Region

A local version of a channel.

Channel
Region
Others?

Source

A source is a raw VBI capture. It is a sequence of samples organized into lines of a fixed length. It may contain separate recordings from different dates and channels. This corresponds to any raw capture file on disk with unknown contents.

GUID
Owner (Person)
Source Format (VHS, Beta, LD, Live etc)
Recording Device (If known)
Playback Device (If known)
Capture Device (BT8x8, DdD etc)
Sample Format (eg unsigned 8 bit)
Line length (samples per line)
Field lines (lines per field)
Field range (which lines in each field may contain teletext)

Split

A split is a subsection of a source which contains a single contiguous teletext stream, for example an uninterrupted recording of a whole episode of a TV show.

Source
Start Line
End Line
Start TimeDate (Approximate)
End TimeDate (Approximate. Can be calculated)
Channel
Region (if known)

Deconvolution

A deconvolution is the result from deconvolving a split. It is a teletext packet stream. Ideally it should be possible to link each line in the deconvolution back to the raw VBI samples it came from.

Split
Person (Who deconvolved it)
Software/version used
Subjective Quality (as decided by the uploader and/or others)
Objective Quality (as decided by some algorithm yet to be written)

Subpage

A reconstructed subpage. Should record the method used for restoration, and all lines used to compile. If it is a revision of a previous subpage, reference to that subpage.

Page Number
Subpage Number
TimeDate (Approximate. Calculated from earliest header?)
Packets
Raw (has it been modified from the deconvolution at all?)
Spellchecked (has it been passed through an automated spelling checker?)
Manual restoration (has it been manually fixed by a human?)
Subjective Quality (as decided by the uploader and/or other viewers)
Objective Quality (as decided by some algorithm yet to be written)
Parents (if this is squashed or manually restored), or:
Source Deconvolution
Source Lines (the lines from the Split used to generate it)

Storage Requirements

Raw VBI Data

Raw VBI data is fixed size depending on the length of the capture. Currently the BT8x8 captures are the largest size with 2048 8-bit samples per line. Raw Domesday Duplicator output is larger but contains the whole picture which we do not need.

1 hour: 5.75 GB
A four hour tape: 23 GB
9000 four hour tapes: 207 TB
1 hour per day for 40 years and 5 channels: 415 TB
All teletext ever broadcast on UK terrestrial analog: 10 PB

Deconvolved Packets

Each packet is 42 bytes without the fixed CRI, one packet per line. It is closely related to the length of the original capture, but lines which do not contain teletext can be discarded. Numbers below assume all lines are used.

1 hour: 120 MB
A four hour tape: 480 MB
9000 four hour tapes: 4.32 TB
1 hour per day for 40 years and 5 channels: 9 TB
All teletext ever broadcast on UK terrestrial analog: 210 TB

Services

A single subpage is approximately 27 packets, or about 1kB. A service is a set of subpages broadcast in a looping sequence. It is difficult to quantify an exact size as the number of subpages varies greatly between channels and years. However a typical service seems to be about 2 MB.

9000 four hour tapes, assuming half-hour recordings: 72000 services, 144 GB
1 service per day for 40 years and 5 channels: 73000 services, 146 GB

Compression

Raw VBI samples can be compressed using a variety of algorithms. Compression used should be lossless. There are broadly three types of algorithms to consider: data, video, and audio. Each has pros and cons.

General data compression algorithms are always lossless, but they provide the worst compression ratios and are often extremely slow. They can’t store metadata about the samples.
Video codecs have the advantage of being able to encode line and frame sizes, and are designed with very large files in mind. They provide compression somewhat better than data algorithms, but are not optimal because VBI samples have no vertical or temporal correlation. Compression is often very slow.
Audio codecs give by far the best compression ratio and the fastest encodes but are usually not designed to store such large amounts of data. As a result of this, seeking can be slow.

Note that deconvolved packets are much smaller and compress well with data algorithms like gzip due to their primarily textual content.

Algorithm	Type	Size	Encode Time	Command
raw	raw	25GB	4 hours	Uncompressed - encode time is length.
gzip	data	16GB	2 hours	gzip -9 sample.vbi
xz	data	15GB	8 hours	xz -9 sample.vbi
flac	audio	10GB	15 minutes	flac --best --sample-rate=48000 --sign=unsigned --channels=1 --endian=little --bps=8 --blocksize=65535 --lax -f sample.vbi
x264	video	13GB	2 hours	ffmpeg -f rawvideo -pix_fmt y8 -s:v 2048x32 -r 25 -i sample.vbi -c:v libx264 -crf 0 -preset slow sample-h264.mkv
x265	video	18GB	3 hours	ffmpeg -f rawvideo -pix_fmt y8 -s:v 2048x32 -r 25 -i sample.vbi -c:v libx264 -x264-params "profile=monochrome8:lossless=1:preset=slow" sample-h265.mkv
ffv1	video	13GB	1 hour	ffmpeg -f rawvideo -pix_fmt y8 -s:v 2048x32 -r 25 -i sample.vbi -c:v ffv1 -level 3 sample-ffv1.mkv
ffv1	video	12GB	16 minutes	ffmpeg -f rawvideo -pix_fmt y8 -s:v 2048x32 -r 25 -i 0001.vbi -c:v ffv1 -level 3 -coder 1 -threads 8 -context 1 -slices 4 sample-ffv1.l3.c1.large.mkv
aec	pcm	14GB	18 minutes	aec -n 8 sample.vbi sample.sz

Challenges

Location of Raw VBI Samples

These are very large. Will the user upload them to the archive, or just upload deconvolved data?

Maintaining Provenance of Results

Each deconvolved line needs to be mapped back to the original samples. There are a few ways to do this:

Make the deconvolved output have the same number of lines as the input. This means adding filler for VBI lines that don’t contain teletext.
Store line numbers in the deconvolved data. This makes it harder to parse.
Store the mapping in a separate file.
Store the mapping directly in the database.

Teletext

Teletext

What Is & Was Teletext?

Teletext Decoding

Generating Teletext

Teletext Viewer Tools

Documents

Recovery Workflow

Capture

Splits

Deconvolution

Raw Subpages

Squashing

Manual Restoration

Back Propagation

Using the Archive

Browsing

Searching

Rendering

API

Database Schema

Channel

Region

Source

Split

Deconvolution

Subpage

Storage Requirements

Raw VBI Data

Deconvolved Packets

Services

Compression

Challenges

Location of Raw VBI Samples

Maintaining Provenance of Results

Page End

Home

Starting & Contributing

Software Installation

Capture Hardware

Hardware Installation

Software Usage

VBI Data Extraction & Decoding

Tools & Scripts

Guides

Technical

Other Decoders

Support & Community

Clone this wiki locally