Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compaction foundations #109

Merged
merged 10 commits into from
Nov 10, 2021
Merged

Compaction foundations #109

merged 10 commits into from
Nov 10, 2021

Conversation

thedodd
Copy link
Collaborator

@thedodd thedodd commented Oct 24, 2021

Compaction has been implemented for Streams & Pipelines.

More efficient data storage pattern which also involves few trees/keyspaces.

closes #99

This is a building block for implementing time-based compaction of
streams. When batches are published, a timestamp is recorded in a
secondary index pointing to the last offset of the batch. Time-based
compaction will then be able to truncate data based on the offsets
recorded in the secondary index.
This commit updates the scheduler to ensure that a Stream's generated
StatefulSet targets its corresponding headless service correctly.

The pipelines module has been refactored for more optimal storage
pattern, which has removed the need for a parallel metadata tree for the
pipeline.

The utils module has been thoroughly tested which now guards against
regressions in our data storage and indexing strategy. This also
demonstrates the expected behavior of lexicographically ordered
range scans and prefix scans.
@thedodd thedodd force-pushed the 99-compaction branch 4 times, most recently from dfbb047 to f9403c8 Compare October 25, 2021 03:50
Pipeline data indexing strategy has been updated to use a single tree and
a more efficient indexing strategy based upon lexicographical ordering
of encoded keys.
Updated to the latest rustc 1.56.0.

Refactored Pipelines to simplify the delivery pass mechanism.

Fixed a bug where active pipeline instances restored from disk were not
being properly pruned if they were already complete. As part of the
compaction story, finished pipelines will be removed from disk.

Finished pipelines are now being removed.
@thedodd thedodd force-pushed the 99-compaction branch 3 times, most recently from b2e0cf9 to 78e42c1 Compare October 29, 2021 03:06
Tracking of stream earliest timestamp records is being tracked. This
provides the foundation for our stream compaction/truncation system.

Updated tests to assert the proper functionality of the earliest
timestamp tracking pattern.

Fixed a few small docs items in the pipeline-txp example.

Update pipeline spawning routine to ensure that only pipelines which are
part of the parent stream are spawned.
@thedodd thedodd force-pushed the 99-compaction branch 5 times, most recently from 0c5f7e2 to e0639d6 Compare November 5, 2021 13:38
Compaction routine is now well-tested. Woot woot!

Operator has been updated to pass along retention policy config to
stream.

Updated deps across all components.

closes #99
This ensures that spurious liveness profiles do not interrupt compaction
from ever being run.

Replaced chrono w/ time crate.
@thedodd thedodd force-pushed the 99-compaction branch 2 times, most recently from 23b8f20 to 25e2a87 Compare November 9, 2021 02:44
Added tests for stream::subscriber::spawn_group_fetch.

Added tests for stream::subscriber::try_record_delivery_response.

Added tests for stream::subscriber::ensure_subscriber_record.
@thedodd thedodd force-pushed the 99-compaction branch 4 times, most recently from c81d145 to dcd73a4 Compare November 10, 2021 03:59
@thedodd thedodd merged commit 264f026 into main Nov 10, 2021
@thedodd thedodd deleted the 99-compaction branch November 10, 2021 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Impl compaction & TTL system for Streams & Pipelines
1 participant