Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shuffler can get stuck when having holes in the outbox #1639

Closed
tillrohrmann opened this issue Jun 18, 2024 · 1 comment · Fixed by #1640
Closed

Shuffler can get stuck when having holes in the outbox #1639

tillrohrmann opened this issue Jun 18, 2024 · 1 comment · Fixed by #1640
Assignees
Labels
bug Something isn't working partition-processor

Comments

@tillrohrmann
Copy link
Contributor

When there are holes in the outbox, the shuffler can get stuck because it assumes that the outbox entries are stored with consecutive outbox sequence numbers. The reason for this assumption was to replace a scan operation with a point-lookup. The reason why holes can appear in the outbox is because TruncateOutbox messages from the shuffler to the PP are considered hints and might get dropped. Since the PP only deletes those outbox entries for which it receives a TruncateOutbox message, a missed hint leads to remaining outbox entries and thereby potential holes (if the next message gets truncated).

Since we might already have users that have a holey outbox, the fix to this problem needs to be able to handle holes in an outbox.

@tillrohrmann tillrohrmann self-assigned this Jun 18, 2024
@tillrohrmann
Copy link
Contributor Author

The very short-term fix could be to replace all get_message with get_next_message which scans for the next outbox message. A more efficient solution could be to use a tailing iterator on the outbox table similar to how the LocalLogletReadStream works.

@AhmedSoliman AhmedSoliman added bug Something isn't working partition-processor labels Jun 19, 2024
tillrohrmann added a commit to tillrohrmann/restate that referenced this issue Jun 19, 2024
This commit changes the shuffle to always use get_next_message to read
the next outbox message. Before we were using get_message which was only
looking at a specific outbox entry. If this outbox entry was empty, then
the shuffle assumed that the outbox is empty. This did not work if the
outbox contained holes. Now with get_next_message, we always scan until
the next outbox message.

The change itself is trivial. In order to ensure that the change works,
this commit added a few unit tests to ensure the desired behaviour.

Note: With replacing get_message with get_next_message, the shuffle will
perform for every read a RocksDB scan operation. This is highly inefficient
and we should replace this logic with a tailing iterator.

This fixes restatedev#1639.
tillrohrmann added a commit to tillrohrmann/restate that referenced this issue Jun 19, 2024
This commit changes the shuffle to always use get_next_message to read
the next outbox message. Before we were using get_message which was only
looking at a specific outbox entry. If this outbox entry was empty, then
the shuffle assumed that the outbox is empty. This did not work if the
outbox contained holes. Now with get_next_message, we always scan until
the next outbox message.

The change itself is trivial. In order to ensure that the change works,
this commit added a few unit tests to ensure the desired behaviour.

Note: With replacing get_message with get_next_message, the shuffle will
perform for every read a RocksDB scan operation. This is highly inefficient
and we should replace this logic with a tailing iterator.

This fixes restatedev#1639.
tillrohrmann added a commit to tillrohrmann/restate that referenced this issue Jun 20, 2024
This commit changes the shuffle to always use get_next_message to read
the next outbox message. Before we were using get_message which was only
looking at a specific outbox entry. If this outbox entry was empty, then
the shuffle assumed that the outbox is empty. This did not work if the
outbox contained holes. Now with get_next_message, we always scan until
the next outbox message.

The change itself is trivial. In order to ensure that the change works,
this commit added a few unit tests to ensure the desired behaviour.

Note: With replacing get_message with get_next_message, the shuffle will
perform for every read a RocksDB scan operation. This is highly inefficient
and we should replace this logic with a tailing iterator.

This fixes restatedev#1639.
tillrohrmann added a commit to tillrohrmann/restate that referenced this issue Jun 20, 2024
This commit changes the shuffle to always use get_next_message to read
the next outbox message. Before we were using get_message which was only
looking at a specific outbox entry. If this outbox entry was empty, then
the shuffle assumed that the outbox is empty. This did not work if the
outbox contained holes. Now with get_next_message, we always scan until
the next outbox message.

The change itself is trivial. In order to ensure that the change works,
this commit added a few unit tests to ensure the desired behaviour.

Note: With replacing get_message with get_next_message, the shuffle will
perform for every read a RocksDB scan operation. This is highly inefficient
and we should replace this logic with a tailing iterator.

This fixes restatedev#1639.
tillrohrmann added a commit that referenced this issue Jun 20, 2024
This commit changes the shuffle to always use get_next_message to read
the next outbox message. Before we were using get_message which was only
looking at a specific outbox entry. If this outbox entry was empty, then
the shuffle assumed that the outbox is empty. This did not work if the
outbox contained holes. Now with get_next_message, we always scan until
the next outbox message.

The change itself is trivial. In order to ensure that the change works,
this commit added a few unit tests to ensure the desired behaviour.

Note: With replacing get_message with get_next_message, the shuffle will
perform for every read a RocksDB scan operation. This is highly inefficient
and we should replace this logic with a tailing iterator.

This fixes #1639.
tillrohrmann added a commit to tillrohrmann/restate that referenced this issue Jun 20, 2024
This commit changes the shuffle to always use get_next_message to read
the next outbox message. Before we were using get_message which was only
looking at a specific outbox entry. If this outbox entry was empty, then
the shuffle assumed that the outbox is empty. This did not work if the
outbox contained holes. Now with get_next_message, we always scan until
the next outbox message.

The change itself is trivial. In order to ensure that the change works,
this commit added a few unit tests to ensure the desired behaviour.

Note: With replacing get_message with get_next_message, the shuffle will
perform for every read a RocksDB scan operation. This is highly inefficient
and we should replace this logic with a tailing iterator.

This fixes restatedev#1639.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working partition-processor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants