-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
what happens during network partition? #146
Comments
There's still no magic, unfortunately. If you have 2 sites, and the only link between them breaks, there isn't much of a choice. Applications that can still reach half of the cluster that still has the old master will continue to work, as long as exactly half the nodes are connected (this reduces the problem of needing a majority to operate to having a majority to elect). Applications that can't reach that half, are unavailable. If the number of connected systems falls below half, the entire cluster becomes unavailable. This is necessary to maintain consistency. It's an unfortunate choice application/database developers are forced to make. So in summary: Comdb2 has ways of masking errors that occur with some of the nodes of a cluster. If you maintain multiple redundant network, and have enough sites that a failure of a site doesn't reduce the capacity of a cluster below half, the cluster becomes unavailable. We can certainly document this in more detail. If you have questions about specific items/points, we'll be sure to include those in the docs. |
you used the term 'site' in your answer. what does it mean? |
More like a datacenter. Consider figure 1 from https://bloomberg.github.io/comdb2/transaction_model.html |
If you're really interested in testing network partitions, there's decent test included here |
is data synchronously replicated among all 'sites'? |
Yes and yes. |
Or you can make it fully async (with the usual trade-offs, once again). |
does 'API Code' mean the client jdbc driver here? |
Yes |
There's a C and Java (jdbc) implementations of the protocol included. Both of them will retry. If there's other implementations, they may not. |
the vldb paper just says bloomberg prevents network partitions from happening in its own network.
but in a cloud environment, partitions will indeed occur.
in that case will the client throw an error?
if so, then there has to be error handling logic for all calls to comdb2. how does it then simplify coding vs something like postgres? from Alex's talk I was under the assumption that the whole point of comdb2 was to not require tons of error handling.
The text was updated successfully, but these errors were encountered: