Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signals first ordering #261

Merged
merged 12 commits into from
Sep 11, 2023
Merged

Signals first ordering #261

merged 12 commits into from
Sep 11, 2023

Conversation

jeffschoner
Copy link
Contributor

Summary

Introduces signals first ordering within a workflow task window. Other Temporal SDKs handle markers and signals before other event types when replaying workflow history. This ensures that all signals can be processed before a workflow completes, fails, or continues as new.

Fixes #258

Important Note

If you have multiple workers in a fleet and use rolling deployments, adopting this change safely requires additional steps to avoid workflow history corruption or "stuck" workflows. See CHANGELOG.md for details on how to use the legacy_signals configuration flag to prevent these problems during rollout of this version.

Details

With this change, temporal-ruby now handles markers and signals first, as opposed to just markers first. There are a number of parts to this change:

  • Refactored the code that orders events within a history window. Previously, it separated out markers, then evaluated those before other events. It now sorts events based a custom sort function that considers markers, signals, and other event types.
  • Adds SDK metadata support for workflow tasks, and a new language flag for this ordering behavior. This ensures that workflow code run on the new or old ordering, continues to execute deterministically. This can be used for future determinism breaking changes.
  • Adds lazy loading of capabilities via GetSystemInfo. The SDK metadata feature this new ordering relies on is only available on Temporal server 1.20.0 or newer. This ensures the feature will not be activated when connected to an older server version where this change cannot be safely used.
  • Adds a legacy_signals configuration option to preserve old ordering behavior. This serves a few important purposes:
    • Delaying adoption until a new server version has been completely rolled out
    • Non-determinism issues when deploying this code for the first time
  • Some files have styling changes from the use of rubyfmt

Testing

  • Many new and updated specs for worker, task processor, GRPC, and state manger
  • A new integration spec for signals that is fixed by this change

history_window.events.each do |event|
apply_event(event)
def self.event_order(event, signals_first)
if event.type == 'MARKER_RECORDED'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this first category is only for patches. Other type of markers (eg. local activities and side effects) need to be sorted in the "everything else" category.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current behavior (with and without my change) is that all markers are sorted first. It sounds like only the release markers (similar to but not the same as patches in the Rust core SDKs) should come first. I believe the only other supported marker types are for side effects and local activities. What is the harm of those coming first? I believe they should both be ok since their callbacks aren't invoked on replay (only when playing for the first time) and because they shouldn't (at least in theory) otherwise modify state in workflow functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the harm of those coming first?

Well... That really depends how strict you are about matching history commands on replay. In Core-based SDKs, we expect all commands to replay in the very same order (not sure about Go and Java).

Now, I just looked at how side effect markers are handled, and they turn out getting stored in a distinct queue, from which you pull every time workflow calls the side_effect method. That mostly achieve the same as if they had been left in the "everything else" bucket, except that ordering of side effects vs other type of commands is lost. So that's probably ok, given that side_effects handlers are called synchronously, so ordering of other stuff can't be messed up.

elsif signals_first && signal_event?(event)
# signals come next if we are in signals first mode
2
else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to sort queries in a distinct category, after everything else.

Copy link
Contributor Author

@jeffschoner jeffschoner Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe there is any code that handles query history events specifically. Actually, I don't see any history events that have to do with queries at all. All query handling code lives at the task processor level, and doesn't involve anything in the state manager as far as I know.

@dwillett What do you think?

@DeRauk DeRauk merged commit 4a379c1 into coinbase:master Sep 11, 2023
2 checks passed
@jeffschoner jeffschoner deleted the signals-first branch September 17, 2023 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Signals cannot be processed in final workflow task
3 participants