Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean namespace handover #3692

Merged
merged 8 commits into from
Dec 21, 2022
Merged

Clean namespace handover #3692

merged 8 commits into from
Dec 21, 2022

Conversation

yycptt
Copy link
Member

@yycptt yycptt commented Dec 6, 2022

What changed?

  • Block creation/update operation if namespace is in handover state
  • Use immediate task max read level as max replication taskID for checking is handover is done and ready for handover. The previously value from replication ack manager may miss some taskIDs as the value is updated through task notification which is async.

Why?

  • Make sure a clean handover for namespace migration and two cluster won't have conflicting history.

How did you test it?

  • Need to test on test cluster

Potential risks

  • handover may take longer to complete & more retires with in the system during namespace failover as history service will now fail workflow creation/update requests.

Is hotfix candidate?

  • no

@yycptt yycptt requested review from yux0 and yiminc December 6, 2022 19:11
@yycptt yycptt requested a review from a team as a code owner December 6, 2022 19:11
common/resource/fx.go Outdated Show resolved Hide resolved
service/worker/migration/activities.go Show resolved Hide resolved
common/metrics/metric_defs.go Show resolved Hide resolved
common/resource/fx.go Outdated Show resolved Hide resolved
service/history/consts/const.go Outdated Show resolved Hide resolved
service/history/shard/context_impl.go Outdated Show resolved Hide resolved
@meiliang86
Copy link
Contributor

Potential risks: larger unavailable window?

@meiliang86
Copy link
Contributor

meiliang86 commented Dec 15, 2022

Is there a way to check/verify that conflict resolution never happened after graceful failover?

common/util.go Show resolved Hide resolved
@yycptt
Copy link
Member Author

yycptt commented Dec 21, 2022

Potential risks: larger unavailable window?

yes if shard is unstable during handover, otherwise the increase should be very small.

Is there a way to check/verify that conflict resolution never happened after graceful failover?

We can check if there's any persistence_request metric with ConflictResolveWorkflowExecution operation tag.

@yycptt yycptt merged commit 3cb9232 into temporalio:master Dec 21, 2022
@yycptt yycptt deleted the clean-ns-handover branch December 21, 2022 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants