Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

demo: add default locality settings for multi-node demos #39938

Closed
jordanlewis opened this issue Aug 23, 2019 · 9 comments
Closed

demo: add default locality settings for multi-node demos #39938

jordanlewis opened this issue Aug 23, 2019 · 9 comments
Assignees
Labels
A-demo C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@jordanlewis
Copy link
Member

The syntax added in #39786 is great for being able to set up a custom multi-node demo cluster with different localities. However, it takes quite a bit of effort to learn and remember the syntax.

The goal of cockroach demo is to be dead simple, yet provide a realistic look at how CockroachDB works without having to set anything up ahead of time.

To this end, we should add default localities that are populated when you start a multi-node demo cluster. Let's make the default localities so that the cluster can be a reasonable shape no matter what you set the nodes argument to. I'm thinking:

region=us-east1,az=b
region=us-east1,az=c
region=us-east1,az=d
region=us-west1,az=a
region=us-west1,az=b
region=us-west1,az=c
region=europe-west1,az=b
region=europe-west1,az=c
region=europe-west1,az=d

This way, a 3-node cluster lives in us-east1, a 6-node cluster lives in us-east and us-west, and a 9-node cluster lives in us-east, us-west, europe-west.

@jseldess does this make sense to you as a reasonable default that might work for many types of demos?

@jordanlewis jordanlewis added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-cli labels Aug 23, 2019
@jordanlewis jordanlewis changed the title demo: add default distribution settings for multi-node demos demo: add default locality settings for multi-node demos Aug 23, 2019
@awoods187
Copy link
Contributor

Let's also add:

  • A cockroach demo specific enterprise license and set the organization to cockroach demo (will ensure people can use features as well as let us see what features are tested with demo in telemetry)
  • The localities lat/long for the webui map

@awoods187
Copy link
Contributor

I'm picturing this as ultimately looking like:

./cockroach demo --nodes 3 --multi-region
And then, as a result, the following happens:

  • Enterprise license and demo org set
  • Based on nodes, the locality is set as described as above
  • Webui is hardcoded to work with the localities chosen above

@jseldess
Copy link
Contributor

Like your defaults, @jordanlewis!

@jseldess
Copy link
Contributor

I think @jordanlewis's approach is great. We'll definitely need 3 nodes per region in order to demonstrate any region-based zone config stuff.

@rohany
Copy link
Contributor

rohany commented Aug 26, 2019

Does the following make sense --

when the --multi-region flag is given, we default to a 3 node cluster with the above settings.

If the --nodes flag is given, then we accept either 3, 6, or 9 nodes, and use the above localities.

RE the license, I'm not sure how to do that, is there a way we can automatically give a demo license? Also if we can auto-load a license, then we for sure can automatically pre-partition movr across the nodes too.

@jordanlewis
Copy link
Member Author

I think --multi-region is probably just unnecessary as a flag, if we use the above locality settings by default. I think accepting any number of nodes is also fine - we'll round-robin them through the locality settings. There's nothing wrong with having non-multiple-of-3 number of nodes in principle and I don't think there's a reason to hamstring the demo app - we'll just tell people to use 3, 6 or 9 nodes for different kinds of demos.

@jordanlewis
Copy link
Member Author

In other words, let's start with --nodes roundrobining through the localities. We can refine it more afterward if necessary.

@rohany
Copy link
Contributor

rohany commented Aug 26, 2019

What do you mean by round robin them? Like if the user supplies 10 nodes, then we assign the 10'th node the first default locality?

@jordanlewis
Copy link
Member Author

Exactly.

rohany added a commit to rohany/cockroach that referenced this issue Aug 26, 2019
Addresses part of cockroachdb#39938.

Release note (cli change): Default cluster locality topologies for
multi-node cockroach demo clusters.
rohany added a commit to rohany/cockroach that referenced this issue Aug 26, 2019
Addresses part of cockroachdb#39938.

Release note (cli change): Default cluster locality topologies for
multi-node cockroach demo clusters.
craig bot pushed a commit that referenced this issue Aug 26, 2019
39936: storage: add (default-off) atomic replication changes r=nvanbenschoten a=tbg

This PR contains a series of commits that first pave for the way and ultimately
allow carrying out atomic replication changes via Raft joint consensus.

Atomic replication changes are required to avoid entering unsafe configurations
during lateral data movement. See #12768 for details; this is a problem we want
to address in 19.2.

Before merging this we'll need to sort out an upstream change in Raft which
has made a bug in our code related to learner snapshots much more likely; the
offending upstream commit is patched out of the vendored etcd bump in this PR
at the time of writing.

An antichronological listing of the individual commits follows. They should be
reviewed individually, though it may be helpful to look at the overall diff for
overall context. A modest amount of churn may exist between the commits, though
a good deal of effort went into avoiding this.

    storage: allow atomic replication changes in ChangeReplicas

    They default to OFF.

    This needs a lot more tests which will be added separately in the course of
    switching the default to ON and will focus on the interactions of joint
    states with everything else in the system.

    We'll also need another audit of consumers of the replica descriptors to
    make sure nothing was missed in the first pass.

    Release note: None

    storage: fix replicaGCQueue addition on removal trigger

    Once we enter joint changes, the replica to be removed will show up in
    `crt.Removed()` when the joint state is entered, but it only becomes
    eligible for actual removal when we leave the joint state later. The new
    code triggers at the right time, namely when the replica is no longer in
    the descriptor.

    Release note: None

    storage: let execChangeReplicasTxn construct the descriptor

    Prior to this commit, the method took both an old and a new desc *plus*
    slices of added and removed replicas. This had grown organically, wasn't an
    easily understood interface, led to repetitive and tricky code at the
    callers, and most importantly isn't adequate any more in a world with
    atomic replication changes, where execChangeReplicasTxn in constructing the
    ChangeReplicasTrigger is essentially deciding whether a joint configuration
    needs to be entered (which in turn determines what the descriptor needs to
    look like in the first place). To start solving this, let
    execChangeReplicasTxn create (and on success return) the new descriptor.
    Callers instead pass in what they want to be done, which is accomplished
    via an []internalReplicationChange slice.

    Release note: None

    roachpb: auto-assign ReplicaID during AddReplica

    This is a cleanup leading up to a larger refactor of the contract around
    `execChangeReplicasTxn`.

    Release note: None

    storage: emit ConfChangeV2 from ChangeReplicasTrigger where appropriate

    This prepares the trigger -> raft translation code to properly handle
    atomic replication changes.

    This carries out a lot of validation to give us confidence that any unusual
    transitions would be caught quickly.

    This change also establishes more clearly which added and removed replicas
    are to be passed into the trigger when transitioning into a joint
    configuration. For example, when adding a voter, one technically replaces a
    Learner with a VoterIncoming and so the question is which type the replica
    in the `added` slice should have.  Picking the Learner would give the
    trigger the most power to validate the input, but it's annoying to have
    divergent descriptors floating around, so by convention we say that it is
    always the updated version of the descriptor (i.e. for fully removed
    replicas, just whatever it was before it disappeared). I spent more time on
    this than I'm willing to admit, in particular looking removing the
    redundancy here, but it made things more awkward than was worth it.

    Release note: None

    storage: push replication change unrolling into ChangeReplicas

    There are various callers to ChangeReplicas, so it makes more sense to
    unroll at that level. The code was updated to - in principle - do the right
    thing when atomic replication changes are requested, except that they are
    still unimplemented and a fatal error will serve as a reminder of that. Of
    course nothing issues them yet.

    Release note: None

    storage: skip ApplyConfChange on rejected entry

    When in a joint configuration, passing an empty conf change to
    ApplyConfChange doesn't do the right thing any more: it tells Raft that
    we're leaving the joint config. It's not a good idea to try to tell Raft
    anything about a ConfChange that got rejected. Raft internally knows that
    we handled it because it knows the applied index.

    This also adds a case match for ConfChangeV2 which is necessary to route
    atomic replication changes (ConfChangeV2).

    See etcd-io/etcd#11046

    Release note: None

    storage: un-embed decodedConfChange

    I ate a number of NPEs during development because nullable embedded fields
    are tricky; they hide the pointer derefs that often need a nil check. We'll
    embed the fields of decodedConfChange instead which works out better. This
    commit also adds the unmarshaling code necessary for ConfChangeV2 needed
    once we issue atomic replication changes.

    Release note: None

    storage: add learners one by one

    Doing more than one change at once is going to force us into an atomic
    replication change. This isn't crazy, but seems unnecessary at this point,
    so just add the learners one by one.

    Release note: None

    storage: add fatals where atomic conf changes are unsupported

    These will be upgraded with proper handling when atomic replication changes
    are actually introduced, but for now it's convenient to stub out some code
    that will need to handle them and to make sure we won't forget to do so
    later.

    Release note: None

    storage: add atomic replication changes cluster setting

    This defaults to false, and won't have an effect unless the newly
    introduced cluster version is also active.

    Release note: None

    roachpb: support zero-change ChangeReplicasTrigger

    We will use a ChangeReplicasTrigger without additions and removals when
    transitioning out of a joint configuration, so make sure it supports this
    properly.

    Release note: None

    roachpb: return "desired" voters from ReplicaDescriptors.Voters

    Previous commits introduced (yet unused) voter types to encode joint
    consensus configurations which occur during atomic replication changes.

    Access to the slice of replicas is unfortunately common, though at least
    it's compartmentalized via the getters Voters() and Learners().

    The main problem solved in this commit is figuring out what should be
    returned from Voters(): is it all VoterX types, or only voters in one of
    the two majority configs part of a joint quorum?

    The useful answer is returning the set of voters corresponding to what the
    config will be once the joint state is exited; this happens to be what most
    callers care about. Incoming and full voters are really the same thing in
    our code; we just need to distinguish them from outgoing voters to
    correctly maintain the quorum sizes.

    Of course there are some callers that do care about quorum sizes, and a
    number of cleanups were made for them.

    This commit also adds a ReplicaDescriptors.ConfState helper which is then
    used in all of the places that were previously cobbling together a
    ConfState manually.

    Release note: None

    roachpb: add ReplicaType_Voter{Incoming,Outgoing}

    These are required for atomic replication changes to describe joint
    configurations, i.e. configurations consisting of two sets of replica which
    both need to reach quorum to make replication decisions.

    An audit of existing consumers of this enum will follow.

    Release note: None

    roachpb: rename ReplicaType variants

    The current naming is idiomatic for proto enums, but atypical for its usage
    in Go code. There is no `(gogoproto.customname)` that can fix this, and
    we're about to add more replica types that would require awkward names such
    as `roachpb.ReplicaType_VOTER_OUTGOING`.

    Switch to a Go-friendly naming scheme instead.

    Release note: None

    batcheval: generalize checkNotLearnerReplica

    This now errors out whenever the replica is not a voter, which is more
    robust as new replica types are introduced (which generally should not
    automatically become eligible to receive leases).

    Release note: None

    roachpb: improve RangeDescriptor.Validate

    Make sure there isn't more than one replica per store.

    Release note: None

    roachpb: generalize ReplicaDescriptor.String()

    The new code will generalize to new replica types.

    Release note: None

    [dnm] vendor: bump raft

    This picks up upstream fixes related to atomic membership changes.

    I had to smuggle in a small hack because we're picking up
    etcd-io/etcd#11037 which makes a race between the
    snapshot queue and the proactive learner snapshot much more likely, and
    this in turn makes tests quite flaky because it turns out that if the
    learner snap loses, it can actually error out.

    Release note: None

    storage: avoid fatal error from splitPostApply

    This is the next band-aid on top of #39658 and #39571. The descriptor
    lookup I added sometimes fails because replicas can process a split trigger
    in which they're not a member of the range:

    > F190821 15:14:28.241623 312191 storage/store.go:2172
    > [n2,s2,r21/3:/{Table/54-Max}] replica descriptor of local store not
    > found in right hand side of split

    I saw this randomly in `make test PKG=./pkg/ccl/partitionccl`.

    Release note: None

40221: cli: Add default locality settings for multi node demo clusters r=jordanlewis a=rohany

Addresses part of #39938.

Release note (cli change): Default cluster locality topologies for
multi-node cockroach demo clusters.

Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
@rohany rohany closed this as completed Aug 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-demo C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

4 participants