-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shuffler can get stuck when having holes in the outbox #1639
Labels
Comments
The very short-term fix could be to replace all |
tillrohrmann
added a commit
to tillrohrmann/restate
that referenced
this issue
Jun 19, 2024
This commit changes the shuffle to always use get_next_message to read the next outbox message. Before we were using get_message which was only looking at a specific outbox entry. If this outbox entry was empty, then the shuffle assumed that the outbox is empty. This did not work if the outbox contained holes. Now with get_next_message, we always scan until the next outbox message. The change itself is trivial. In order to ensure that the change works, this commit added a few unit tests to ensure the desired behaviour. Note: With replacing get_message with get_next_message, the shuffle will perform for every read a RocksDB scan operation. This is highly inefficient and we should replace this logic with a tailing iterator. This fixes restatedev#1639.
tillrohrmann
added a commit
to tillrohrmann/restate
that referenced
this issue
Jun 19, 2024
This commit changes the shuffle to always use get_next_message to read the next outbox message. Before we were using get_message which was only looking at a specific outbox entry. If this outbox entry was empty, then the shuffle assumed that the outbox is empty. This did not work if the outbox contained holes. Now with get_next_message, we always scan until the next outbox message. The change itself is trivial. In order to ensure that the change works, this commit added a few unit tests to ensure the desired behaviour. Note: With replacing get_message with get_next_message, the shuffle will perform for every read a RocksDB scan operation. This is highly inefficient and we should replace this logic with a tailing iterator. This fixes restatedev#1639.
This was referenced Jun 19, 2024
tillrohrmann
added a commit
to tillrohrmann/restate
that referenced
this issue
Jun 20, 2024
This commit changes the shuffle to always use get_next_message to read the next outbox message. Before we were using get_message which was only looking at a specific outbox entry. If this outbox entry was empty, then the shuffle assumed that the outbox is empty. This did not work if the outbox contained holes. Now with get_next_message, we always scan until the next outbox message. The change itself is trivial. In order to ensure that the change works, this commit added a few unit tests to ensure the desired behaviour. Note: With replacing get_message with get_next_message, the shuffle will perform for every read a RocksDB scan operation. This is highly inefficient and we should replace this logic with a tailing iterator. This fixes restatedev#1639.
tillrohrmann
added a commit
to tillrohrmann/restate
that referenced
this issue
Jun 20, 2024
This commit changes the shuffle to always use get_next_message to read the next outbox message. Before we were using get_message which was only looking at a specific outbox entry. If this outbox entry was empty, then the shuffle assumed that the outbox is empty. This did not work if the outbox contained holes. Now with get_next_message, we always scan until the next outbox message. The change itself is trivial. In order to ensure that the change works, this commit added a few unit tests to ensure the desired behaviour. Note: With replacing get_message with get_next_message, the shuffle will perform for every read a RocksDB scan operation. This is highly inefficient and we should replace this logic with a tailing iterator. This fixes restatedev#1639.
tillrohrmann
added a commit
that referenced
this issue
Jun 20, 2024
This commit changes the shuffle to always use get_next_message to read the next outbox message. Before we were using get_message which was only looking at a specific outbox entry. If this outbox entry was empty, then the shuffle assumed that the outbox is empty. This did not work if the outbox contained holes. Now with get_next_message, we always scan until the next outbox message. The change itself is trivial. In order to ensure that the change works, this commit added a few unit tests to ensure the desired behaviour. Note: With replacing get_message with get_next_message, the shuffle will perform for every read a RocksDB scan operation. This is highly inefficient and we should replace this logic with a tailing iterator. This fixes #1639.
tillrohrmann
added a commit
to tillrohrmann/restate
that referenced
this issue
Jun 20, 2024
This commit changes the shuffle to always use get_next_message to read the next outbox message. Before we were using get_message which was only looking at a specific outbox entry. If this outbox entry was empty, then the shuffle assumed that the outbox is empty. This did not work if the outbox contained holes. Now with get_next_message, we always scan until the next outbox message. The change itself is trivial. In order to ensure that the change works, this commit added a few unit tests to ensure the desired behaviour. Note: With replacing get_message with get_next_message, the shuffle will perform for every read a RocksDB scan operation. This is highly inefficient and we should replace this logic with a tailing iterator. This fixes restatedev#1639.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When there are holes in the outbox, the shuffler can get stuck because it assumes that the outbox entries are stored with consecutive outbox sequence numbers. The reason for this assumption was to replace a scan operation with a point-lookup. The reason why holes can appear in the outbox is because
TruncateOutbox
messages from the shuffler to the PP are considered hints and might get dropped. Since the PP only deletes those outbox entries for which it receives aTruncateOutbox
message, a missed hint leads to remaining outbox entries and thereby potential holes (if the next message gets truncated).Since we might already have users that have a holey outbox, the fix to this problem needs to be able to handle holes in an outbox.
The text was updated successfully, but these errors were encountered: