Rework high-level format variants #38

wkalt · 2022-01-21T19:32:03Z

The specification currently makes a division between "chunked" and "unchunked" files, with each having a mandatory set of fields. Discussions have leaned in the direction of this being too restrictive on at least a couple fronts:

Users may want the compression benefits of chunking, but not want the cost of retaining channel info records in RAM for the statistics or chunk index records.
Users of the unchunked format may not want the cost of retaining channel info records in RAM for the statistics record. That's part of what they are trying to avoid by using the unchunked variant to begin with.

In consideration of these, we are considering making the following changes:

Chunked and unchunked files are eliminated as terms. There will be just one "mcap file".
Chunks and messages may both appear at the top level of the file.
Chunk indexes, attachment indexes, statistics, and channel infos in the index data section are optional, but subject to some mutual constraints:

if chunk indexes are included, any channels referenced by those chunk indexes must have channel infos in the index data section
if the channel_stats field of the statistics record is included, any channels it references must be reflected in the index data section as channel infos
if there are no records in the index data section, the index_offset of the footer record will be set to zero. Otherwise it will point to the first record in the section, regardless of what kind of record that is.
the channel_stats field of the statistics record may be zero-length/empty. This is to allow tracking of cheap global file stats without the expense of retaining the channel infos.

Messages written outside chunks will be readable by a sequential reader, but invisible to a random access reader using the chunk index.

Writers that do not include data in the index section will progressively lose utility from the "fast summarization support". The algorithm for "summary" is roughly,

seek to the index_offset
read to the end of the file
report aggregated statistics

If the index data section is empty, no statistics will be aggregated. Fallback behavior to a full file read is inadvisable to maintain good support on remote files. Update the explanatory notes section to discuss this a little bit.

defunctzombie · 2022-01-21T22:20:04Z

There are two types of files - those with a 0 in the index_offset value for the footer and those with a non-zero value.

defunctzombie · 2022-01-21T22:27:38Z

Messages written outside chunks will be readable by a sequential reader, but invisible to a random access reader using the chunk index.

If you are using chunk indices then messages must not appear outside of chunks.

wkalt added the feature New feature or request label Jan 21, 2022

defunctzombie assigned wkalt Jan 21, 2022

wkalt closed this as completed Jan 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework high-level format variants #38

Rework high-level format variants #38

wkalt commented Jan 21, 2022 •

edited

Loading

defunctzombie commented Jan 21, 2022

defunctzombie commented Jan 21, 2022

Rework high-level format variants #38

Rework high-level format variants #38

Comments

wkalt commented Jan 21, 2022 • edited Loading

defunctzombie commented Jan 21, 2022

defunctzombie commented Jan 21, 2022

wkalt commented Jan 21, 2022 •

edited

Loading