Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sled-agent] Persist omicron-zones.json before bringing up zones #5086

Open
andrewjstone opened this issue Feb 16, 2024 · 0 comments
Open

[sled-agent] Persist omicron-zones.json before bringing up zones #5086

andrewjstone opened this issue Feb 16, 2024 · 0 comments
Labels
enhancement New feature or request. Sled Agent Related to the Per-Sled Configuration and Management Update System Replacing old bits with newer, cooler bits
Milestone

Comments

@andrewjstone
Copy link
Contributor

andrewjstone commented Feb 16, 2024

Currently when sled-agent receives an omicron_zones_put it calls omicron_zones_ensure which creates datasets and brings up zones before omicron-zones.json is persisted to the ledger. If a sled reboots before persistence, we'll end up coming up with the old omicron-zones.json and start launching old zones. We believe this is technically safe, because inventory collection will read the old omicron-zones.json and the blueprint executor will redeploy the new zones on the next activation.

While safe, this pattern is "backwards" from what is normally done in distributed systems. Typically you persist the intended state and then go about realizing it. That's what we'd like to do here. We could then also have the inventory collection see which zones are actually up in addition to reading omicron-zones.config.

@davepacheco proposed this path forward:

  • validate that we should be able to honor this request
  • immediately store it. It is now the current intended state. Return immediately.
  • in the background, constantly try to make sure the real state matches the intended state
  • have a way to report:
    • the last generation successfully realized
    • the generation we're trying to realize
    • any transient errors blocking that
    • any persistent errors (where it's come to rest having not done this) -- hopefully these don't exist. if they do, we should identify them during validation instead.
@andrewjstone andrewjstone added enhancement New feature or request. Sled Agent Related to the Per-Sled Configuration and Management Update System Replacing old bits with newer, cooler bits labels Feb 16, 2024
@andrewjstone andrewjstone added this to the MVP milestone Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request. Sled Agent Related to the Per-Sled Configuration and Management Update System Replacing old bits with newer, cooler bits
Projects
None yet
Development

No branches or pull requests

1 participant