-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is upgrade auto-finalization a good default? #57887
Comments
Thanks for raising the issue. It makes sense and is valid. One other note I'll make is that it's best to roll out the new version incrementally, by first testing one node and making sure everything looks good. Finalization cannot happen until all of the nodes are running the new version. Also, even if we don't change the default, I think we can do more to sanity check the upgrade. Namely, we should probably do more to validate the state of the cluster is stable. Perhaps by waiting to make sure nodes have been up for a little while and also to check that the schema seems valid. For what it's worth, in the next release (21.1) we'll be introducing a long-running migration framework to perform durable migrations which today take multiple versions. These migrations will not be able to run until after the cluster version has been finalized. They may be yet another reason to revisit the user story here. In short, I think making the default be to auto-finalize but after a significantly longer amount of time (days?) with the caveat that all nodes stay up during that period seems much more reasonable to me. |
cc @vy-ton who's been thinking about the upgrade user story a bit lately. |
We have marked this issue as stale because it has been inactive for |
Is your feature request related to a problem? Please describe.
Twice now we've been in a position where moving to a new version has caused issues and a downgrade has been necessary:
In the first instance we unfortunately did not set
preserve_downgrade_option
(though I'm not 100% it would have helped in this instance). With the second issue we had set it, so managed to avoid any big catastrophe... though from the issue you can someone else wasn't quite so lucky: #57032 (comment)The v20.2 upgrade documentation specifically states:
This suggests that most people should be trying to avoid auto-finalization.
Describe the solution you'd like
My question is: is it sensible to default to auto-finalization, given some of the issues that pop up and the recommendations in your own documentation? I'm not actively watching issues here, so I don't have reasonable perspective on how often people get into a tangle as a result of this. I did, however, feel it was worth raising the question.
I fully understand requiring operators to take manual steps during upgrades is undesirable. I think it's worth weighing that up with fairly rapidly locking into a new version.
Describe alternatives you've considered
Jira issue: CRDB-3471
The text was updated successfully, but these errors were encountered: