Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what happens during network partition? #146

Closed
pdeva opened this issue Jun 16, 2017 · 10 comments
Closed

what happens during network partition? #146

pdeva opened this issue Jun 16, 2017 · 10 comments
Labels

Comments

@pdeva
Copy link

pdeva commented Jun 16, 2017

the vldb paper just says bloomberg prevents network partitions from happening in its own network.

but in a cloud environment, partitions will indeed occur.

in that case will the client throw an error?

if so, then there has to be error handling logic for all calls to comdb2. how does it then simplify coding vs something like postgres? from Alex's talk I was under the assumption that the whole point of comdb2 was to not require tons of error handling.

@mponomar
Copy link
Contributor

mponomar commented Jun 16, 2017

  1. There's no magic in the world, of course. Partitions occur. But you can guard against them. There's code in Comdb2 to use multiple networks, for example. If a node gets disconnected from a node/set of nodes over one network, it can switch to another.

  2. You can guard against losing half of your nodes (and having a true split brain) by having > 2 sites. As long as some majority of the nodes can still interconnect, they will form a cluster. API code will retry connecting to nodes until it hits a node that's part of a new majority, and resume operating. This happens without application code involvement - no retries necessary. Failures like this usually occur between sites (citation needed, I know).

  3. At a slightly higher cost, you can enable HA mode. In this mode, the API will record all operations in your transaction as well as the database state (log sequence number is sufficient) and replay them against a different node in case of failure. This protects you against having to retry if the node you're connected to fails in the middle of a query or a transaction. We have a lovely demo of a running query/transaction where we kill the database, and the transaction/query resumes when it comes back.

There's still no magic, unfortunately. If you have 2 sites, and the only link between them breaks, there isn't much of a choice. Applications that can still reach half of the cluster that still has the old master will continue to work, as long as exactly half the nodes are connected (this reduces the problem of needing a majority to operate to having a majority to elect). Applications that can't reach that half, are unavailable. If the number of connected systems falls below half, the entire cluster becomes unavailable. This is necessary to maintain consistency. It's an unfortunate choice application/database developers are forced to make.

So in summary: Comdb2 has ways of masking errors that occur with some of the nodes of a cluster. If you maintain multiple redundant network, and have enough sites that a failure of a site doesn't reduce the capacity of a cluster below half, the cluster becomes unavailable.

We can certainly document this in more detail. If you have questions about specific items/points, we'll be sure to include those in the docs.

@pdeva
Copy link
Author

pdeva commented Jun 16, 2017

you used the term 'site' in your answer. what does it mean?
is a 'site' == 'a database node'?

@akshatsikarwar
Copy link
Contributor

akshatsikarwar commented Jun 16, 2017

More like a datacenter. Consider figure 1 from https://bloomberg.github.io/comdb2/transaction_model.html
3 sites of 3 nodes each.

@mponomar
Copy link
Contributor

If you're really interested in testing network partitions, there's decent test included here

@pdeva
Copy link
Author

pdeva commented Jun 16, 2017

is data synchronously replicated among all 'sites'?
wouldnt that be extremely high latency since each 'site' can be in a different geographical region, eg one in 'us east' and one in 'japan'?

@akshatsikarwar
Copy link
Contributor

Yes and yes.
You can set up async replication (sync to same dc, async to others) but that comes with usual trade-offs.

@mponomar
Copy link
Contributor

Or you can make it fully async (with the usual trade-offs, once again).

@pdeva
Copy link
Author

pdeva commented Jun 16, 2017

API code will retry connecting to nodes

does 'API Code' mean the client jdbc driver here?

@akshatsikarwar
Copy link
Contributor

Yes

@mponomar
Copy link
Contributor

There's a C and Java (jdbc) implementations of the protocol included. Both of them will retry. If there's other implementations, they may not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants