Skip to content

Commit

Permalink
rfc: Expand on the initial migration hook proposal
Browse files Browse the repository at this point in the history
  • Loading branch information
a-robinson committed Oct 19, 2016
1 parent b005638 commit ceb571d
Showing 1 changed file with 279 additions and 72 deletions.
351 changes: 279 additions & 72 deletions docs/RFCS/cluster_upgrade_tool.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
- Feature Name: Cluster Upgrade Tool
- Status: draft
- Start Date: 2016-09-15
- Authors: Daniel Harrison
- RFC PR: (PR # after acceptance of initial draft)
- Cockroach Issue: (one or more # from the issue tracker)
- Authors: Daniel Harrison, Alex Robinson
- RFC PR: [#9404](https://github.com/cockroachdb/cockroach/pull/9404)
- Cockroach Issue: [#4267](https://github.com/cockroachdb/cockroach/issues/4267)

# Summary

A series of hooks to perform any necessary bookkeeping before a CockroachDB
A series of hooks to perform any necessary bookkeeping for a CockroachDB
version upgrade.

# Motivation
Expand All @@ -17,91 +17,298 @@ complexity to a codebase. All else being equal, it's better to keep backward
compatibility logic siloed from the significant complexity that is inherent in
software as complex as CockroachDB.

## Examples

* Add a `system.jobs` table (#7073)
* Add root user and authentication to the system.users table (#9877)
* (maybe) Remove the reference to `experimental_unique_bytes` from the
`system.eventlog` table (#5887)
* (maybe) Migrate `TableDescriptor`s to new `FormatVersion`s and ensure
compatibility (#7136)
* (maybe) Change the encoding of the `RaftAppliedIndexKey` and
`LeaseAppliedIndexKey` keys (#9306)
* (maybe) Switch from nanoseconds to microseconds for storing timestamps and
intervals (#9758, #9759)
* Switch over to proposer-evaulated kv (#6290, #6166) - this is likely to be a
special case, where we force a stop-the-world event to make the switch
sometime before 1.0

# Detailed design

Some versions of CockroachDB will require that a "system migration hook" be run
before the cluster is upgraded to that version. This is expected to be
infrequent; not every release will bump the version, only ones that need a
migration hook.
Some versions of CockroachDB will require that a "system migration hook" be run.
This is expected to be infrequent; not every release will require migrations.

## Jargon

A "system migration hook" is a self-contained function that can be run from one
of the CockroachDB nodes in a cluster to modify the state of the cluster in
some way.

Simple migrations can be discussed in terms which versions of the Cockroach
binary are compatible with it: "pre-migration" versions are incompatible with
the migration, "pending-migration" versions are compatible with and without it,
and "post-migration" versions require it. More complex migrations can be
modeled by repeated simple migrations.

Example: Adding a `system.jobs` table. No versions are pre-migration, because
any unknown system tables are ignored. Versions that use the table if it is
there but can function without it are pending-migration. The first commit that
assumes that the table is present begins the post-migration range.

For simplicity, we assume that at most two versions of CockroachDB are ever
running in a cluster at once. This restriction could potentially be relaxed,
but it's out of scope for the first version of this RFC.

Some migrations most naturally match a model where the CockroachDB version with
the migration hook is a post-migration version. One example is a hook to add a
new system table and code that uses the system table. Significant complexity is
avoided if it can be assumed that the table exists. These migrations should be
run before any node starts using the post-migration version.

Other migrations work better when the hook version is pending-migration. When
changing the schema of an existing system table, it's easiest to include the
code that handles both schemas in the same version as the hook that performs the
actual migration. These migrations should be run after all nodes are rolled onto
the hook version.

Our most pressing initial motivations fall into the first model, so it will be
the focus of this RFC.

TODO(dan/alex): Explore how to accommodate post-upgrade migration hooks.

## Short-term design

Because handling migrations in the general case is a very broad problem
encompassing many potential types of migrations, we choose to first tackle the
simplest case, migrations that are backward-compatible, while leaving the door
open to more involved schemes that can support backwards-incompatible
migrations.

Migration hooks consist of a name, a work function, and optionally a minimum
version. In the steady state, there is a map in code between migration names and
the cockroach version they were released in. There is also a list of migrations
added since the last cockroach release.
In the short term, migration hooks will consist of a name and a work function.
In the steady state, there is an ordered list of migration names in the code,
ordered by when they were added.

When a node starts up, it checks that all known migrations have been run (the
amount of work involved in this can be kept bounded using the `/SystemVersion`
described below). If not, it runs them via a migration coordinator. (We should
also do this check when a node rejoins a cluster it's been partitioned from, but
I'm not sure how to do that.)
keys described below). If not, it runs them via a migration coordinator.

The ordered list of migrations can be used to order migrations when a cluster
needs to catch up with more than one of them.

The migration coordinator starts by heartbeating a kv entry to ensure that there
is only ever one running at a time. Then for each missing migration, it runs the
work function and writes a record of the completion to kv
`/SystemVersion/<MigrationName>`. When finished, it writes its own version to
`/SystemVersion`.

The work function must be idempotent (in case a migration coordinator crashes)
and runnable on some previous version of the cluster. If there is some minimum
cluster version it needs, that is checked against the one written to
`/SystemVersion`.

The migration name to version map can be used to order migrations when a cluster
needs to catch up many versions.

We will introduce a thin cli wrapper `./cockroach cluster-upgrade` around
starting the migration coordinator. When upgrading a Cockroach version, this
command will be run and pointed at the cluster. After it returns, the cluster
can be rolled onto the new version.

The cli tool will be the recommended way of upgrading Cockroach versions for
production scenarios. For ease of small clusters and local development, rolling
one node will also do this upgrade before its normal startup. Any nodes rolled
after the first, but before the cluster is upgraded, will panic to make it very
clear the roll is unhealthy.

When a migration is in progress, this should be exposed in the UI. It would be
nice if we could also display the progress in the same place. Once we've used
a migration to add a `system.jobs` table, it will be used for this.
is only ever one running at a time. Other nodes that start up and require
migrations while a different node is doing migrations will have to block until
the kv entry is released (or expired in the case of node failure). Then, for
each missing migration, the migration coordinator runs the work function for
each outstanding migration and writes a record of the completion to kv entry
`/SystemVersion/<MigrationName>`.

## Examples
Each work function must be idempotent (in case a migration coordinator crashes)
and compatible with previous versions of CockroachDB that could be running on
other nodes in the cluster. The latter restriction will be relaxed in the
[long-term design](#long-term-design).

### Examples

Simple migrations, like `CREATE TABLE IF NOT EXISTS system.jobs (...)`, can be
accomplished with one hook and immediately used.

Altering system tables will require more than one migration. For example, we'd
like to remove the reference to `experimental_unique_bytes` from the
`system.eventlog` table so we can delete the function. A migration is run to add
a new column with a new name and the code in this Cockroach version sets both on
mutations and fetches both (preferring the new one) on reads. Once the entire
cluster is on this version, another code change stops reading and writing the
old value. Finally, a second mutation is run to remove the old column. This
could be done in fewer steps if SQL had something like the "unknown fields"
support in Protocol Buffers.

We currently have a series of upgrades (see
[MaybeUpgradeFormatVersion](https://github.com/cockroachdb/cockroach/blob/f6a8692485cb34dc148b9313cb9fca6c53eec42c/sql/sqlbase/structured.go#L304)
and [maybeUpgradeToFamilyFormatVersion](https://github.com/cockroachdb/cockroach/blob/f6a8692485cb34dc148b9313cb9fca6c53eec42c/sql/sqlbase/structured.go#L313))
that run whenever a `TableDescriptor` is fetched or otherwise acquired. These
are necessary so the SQL code knows how to map it to kv entries. A system
migration hook could be used to ensure every SQL table in the system is upgraded
to the latest FormatVersion, allowing the cleanup of this migration code.
accomplished with one hook and immediately used. To add such a migration, you'd
create a new migration called something like "CREATE system.jobs" with a
function that created the new system table and add it to the end of the list of
migrations. It would then automatically be run whenever a version of CockroachDB
that includes it joins an existing cluster for the first time.

The example of adding the root user to `system.users` can also be accomplished
with a single post-migration version hook, meaning that it could similarly just
be added to the list of migration hooks and run on startup without concern for
what CockroachDB versions are being used by other active nodes in the cluster.

## Long-term design

While our most immediate needs don't actually require backwards incompatible
changes, there are a number of examples of changes that we'd like to make that
do. For such hooks that have pre-migration versions, we'll have to put in place
additional infrastructure to ensure safe migrations.

There are two primary approaches for this, which we don't actually have to
choose from now, but should at least understand to ensure that we don't restrict
our options too much with what we do in the shorter term.

### Option 1: Require operator intervention

The first option to support non-backwards compatible migrations is to introduce
a new CLI command `./cockroach cluster-upgrade` that gives the DB administrator
control over when migrations happen. This command will support:

* Listing all migrations that have been run on the cluster
* Listing the available migrations, which migrations they depend on, and which
CockroachDB versions at which it's safe to run each of them
* Running a single migration specified by the admin
* Running all available migrations whose migration dependencies are satisfied
(note that this may be dangerous if the minimum version for a migration isn't
satisfied by all nodes in the cluster; we may want to validate this ourselves
or at least require a scary command line flag to avoid mistakes)

The idea is that before upgrading to a new version of CockroachDB, the admin
will look in the release notes for the version that they want to run and
ensure that they've run all the required migrations. If they haven't, they'll
need to do so before upgrading. If not all the upgrades required by the
desired version are available at the cluster's current version, they may need
to first upgrade to an intermediate version that supports the migrations so
that they can run them.

If a CockroachDB node starts up and the cluster that it joins has not run all
the migrations required by the node's version, that node will exit with an
appropriate error message.

This approach gives administrators total control of potentially destructive
migrations at the cost of adding extra manual work. On the plus side, it
enables rollbacks better than a design where non-backwards compatible changes
are made automatically by the system when starting up nodes with the new
version.

The CLI tool will be the recommended way of upgrading Cockroach versions for
production scenarios. For ease of small clusters and local development,
one-node clusters will be able to do migrations during their normal startup
process if so desired.

#### Overview of changes needed from short-term design

This approach will require a few more details to be stored in the hard-coded
migration descriptors than for the short-term solution. For each migration,
we'll have to include a list of all migrations that it depends on and the
minimum version at which it can safely be run. The data stored in the
`/SystemVersion/<MigrationName>` keys will not have to change.

This approach will also require command-line tooling to be built for the
`./cockroach cluster-upgrade` command.

#### Example

Let's consider the example of switching the storage format of timestamps from
nanoseconds to microseconds (#9759). We can't simply change the code and
add an automatically run migration to the next release of CockroachDB because
it's not safe to change the storage format of all the timestamps in a cluster
while old nodes may not know how to read the new format.

Instead, the migration might look something like this from the perspective of
CockroachDB developers:

1. Release new version of CockroachDB with:
1. Code that can handle both encodings
1. A non-required migration that changes the encodings of all timestamps
in the cluster. This migration probably won't depend on any other
migrations.
1. Some number of releases later (could be the next version, or could be
multiple versions down the line), remove the compatibility code and switch
the migration to be required.

From the perspective of a DB admin, it'd look like:

1. Decide to upgrade CockroachDB cluster to new version
1. Check release notes for the desired version for any required migrations
1. If none, just carry out the upgrade
1. If there are some and the output `./cockroach cluster-upgrade` on their
current cluster includes them, then run them before upgrading.
1. If there are some that aren't in the output of `./cockroach
cluster-upgrade` on their current cluster, then repeat the process for an
earlier version than one initially chosen in the first step before
continuing with this upgrade.

### Option 2: Do all migrations automatically

The main alternative to manual intervention is to attempt to do the operator's
work for them automatically. This would provide the best user experience for
less sophisticated users who care more about their time and energy than about
having total control.

Unlike in the short-term, backwards compatible case, we can't necessarily run
all known migrations when the first node at a new version is started up,
because some migrations may modify the cluster state in ways that are
incompatible with the other nodes in the cluster. We can never automatically run
migrations unless we know that all the nodes in the cluster are at a recent
enough version. That means that we'll need to start tracking both node version
information for the entire cluster and add some code that understands our
semantic versioning scheme sufficiently well to determine which node versions
support which migrations.

That may be some work, but once it's in place, we can go back to doing
migrations on start-up by adding in some logic that checks whether the node
versions in the cluster support the required migrations. If not, the new node
can exit out with an error message. If they do, then the new node can kick off
the migrations.

#### Overview of changes needed from short-term design

This approach will require the same few additional details to be stored in the
hard-coded migration descriptors as for option 1. For each migration,
we'll have to include a list of all migrations that it depends on and the
minimum version at which it can safely be run. The data stored in the
`/SystemVersion/<MigrationName>` keys will not have to change.

The main difference for this approach is that we'll have to start tracking the
versions of all nodes in the cluster. We could potentially use gossip for this,
but have to be sure that we know the state of all nodes, not just most of them
to be confident that it's safe to run a migration.

#### Example

Again, let's consider the example of switching the storage format of timestamps
from nanoseconds to microseconds (#9759).

The migration will look fairly similar from the perspective of CockroachDB
developers:

1. Release new version of CockroachDB with code that can handle both encodings.
1. Include a non-required migration that changes the encodings of all timestamps
in the cluster. This migration probably won't depend on any other migrations.
1. Some number of releases later (could be the next version, or could be
multiple versions down the line):
1. Add an automatically run migration that switches the encodings
1. Remove the compatibility code for handling the old encoding

From the perspective of a DB admin, it'd look like:

1. Decide to upgrade CockroachDB cluster to new version
1. Check release notes for the desired version to see whether it's safe to
upgrade to from the current version
1. If it is, just carry out the upgrade
1. If it isn't, then repeat this process for an earlier version than the one
initially chosen in the first step before continuing with the upgrade

## User experience

When a migration is in progress, that fact should be exposed in the UI. It would
be nice if we could also display the progress in the same place. Once we've used
added a `system.jobs` table (using a simple migration), it can be used for this.

# Drawbacks

This adds administrative complexity to upgrading a CockroachDB cluster.
Requiring that one command be run before any major cluster version upgrade seems
like a small loss compared to the gained flexibility and codebase cleanliness.
The short-term design doesn't appear to have any obvious drawbacks, as it
solves a couple immediate needs and can be added without limiting future
extensibility.

An additional step is added at the end of the release process: moving the
unversioned migration names to the version map with the version just released.
The long-term design adds administrative complexity to the upgrade process, no
matter which option we go with. It'll also add a little extra work to the
release process for developers due to the need to include minimum versions for
migrations -- sometimes the minimum version will be the version being released.

# Alternatives

Currently, various upgrades are performed on an "as needed" basis: see the
`FormatVersion` example above. This has the advantage of supporting cluster
upgrades with no operational overhead, but it introduces code complexity that
will be hard to ever remove. Adding new system tables is particularly complex,
see [system.jobs](https://github.com/cockroachdb/cockroach/pull/7073) for an
example.
`FormatVersion` example mentioned above. This has the advantage of supporting
cluster upgrades with no operational overhead, but it introduces code
complexity that will be hard to ever remove. Adding new system tables is
particularly complex; see the
[attempted `system.jobs` PR](https://github.com/cockroachdb/cockroach/pull/7073)
for an example.

We've also considered requiring a command-line tool to be run using the new
binary on a currently running cluster (on an older version) before upgrading it.
This has a similar admin experience to long-term design option 1, while adding
the complexity of needing to deal with multiple versions of the binary at once.

There is also always the option of requiring clusters to be brought down for
migrations, but given the importance of uptime to most users, we consider that
something to be avoided whenever reasonably possible. It will likely be needed
to switch to proposer-evaluated kv (#6290, #6166) in the pre-1.0 time frame,
but we hope that it won't be needed after that.

0 comments on commit ceb571d

Please sign in to comment.