Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement server-side role management #480

Merged
merged 8 commits into from
Apr 21, 2023

Conversation

cole-miller
Copy link
Contributor

This is still rough and most error handling is stubbed out, but I think it represents a workable approach to moving role management onto the server.

Signed-off-by: Cole Miller cole.miller@canonical.com

@codecov
Copy link

codecov bot commented Feb 28, 2023

Codecov Report

Merging #480 (ef1599a) into master (19a7924) will increase coverage by 1.96%.
The diff coverage is 67.14%.

@@            Coverage Diff             @@
##           master     #480      +/-   ##
==========================================
+ Coverage   59.01%   60.97%   +1.96%     
==========================================
  Files          33       34       +1     
  Lines        5929     6340     +411     
  Branches     1764     1884     +120     
==========================================
+ Hits         3499     3866     +367     
+ Misses       1363     1293      -70     
- Partials     1067     1181     +114     
Impacted Files Coverage Δ
src/transport.c 51.29% <50.00%> (+5.19%) ⬆️
src/roles.c 63.58% <63.58%> (ø)
src/server.c 55.55% <84.84%> (+3.84%) ⬆️
src/client/protocol.c 42.66% <100.00%> (+9.16%) ⬆️
src/config.c 92.00% <100.00%> (+0.69%) ⬆️

... and 2 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@cole-miller cole-miller marked this pull request as ready for review March 21, 2023 03:16
@cole-miller
Copy link
Contributor Author

I'm not entirely satisfied with this branch yet, but I think it's ready for some other eyes on it. Jepsen seems mostly happy, including the roles checker -- there is an "extra online spare" that I'm looking into, but it doesn't occur very often (about once per CI run) and I've also seen it sporadically with the go-dqlite client-side role management.

I've written some unit tests for the role assignment algorithm, but still want to add more of those, plus a couple of integration tests once I figure out how to set that up.

Copy link
Contributor

@MathieuBordere MathieuBordere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on first pass

src/server.c Outdated
}

if (d->role_management) {
/* TODO how should the server respond to client requests while in this state? */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we let raft handle it transparently? raft normally rejects raft_apply requests when it's transferring leadership.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That still leaves the window after the transfer completes where we're transferring voting rights, yeah?

test/unit/test_role_management.c Show resolved Hide resolved
polling->i = j;
work = &work_objs[j];
work->data = polling;
rv = uv_queue_work(&d->loop, work, pollClusterWorkCb, pollClusterAfterWorkCb);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might want to increase the libuv thread pool size, see https://docs.libuv.org/en/v1.x/threadpool.html something like 128 by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

128 seems large to me, but I don't have experience tuning these things. For polling the cluster we run one task on the thread pool for each node in the configuration, so typically 3-5 tasks total.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that if the capacity of the thread pool gets exhausted, then further calls to uv_queue_work will not fail, but rather queued and executed when threads become available. In general the size of the threadpool should ideally match the hardware capacity (e.g. the number of cores of the host), in order to reduce contention and context switches.

I think we can safely leave libuv's default.

@cole-miller
Copy link
Contributor Author

cole-miller commented Apr 17, 2023

I'm wondering whether the assertion failure inside uv_close (resulting from an uninitialized handle) is a problem that only surfaces with specific versions of libuv. When I run the new integration test locally against libuv v1.44.3, it's not triggered. nope, it's my fault, failing to call uv_timer_init in all cases

@freeekanayaka
Copy link
Contributor

I'm wondering whether the assertion failure inside uv_close (resulting from an uninitialized handle) is a problem that only surfaces with specific versions of libuv. When I run the new integration test locally against libuv v1.44.3, it's not triggered.

It's our code that is calling uv_close(), and that assertion seems to imply that the handle that we're passing to uv_close() is somehow screwed up (either uninitialized, as you point, or maybe with corrupted memory that was overwritten somewhere), so even if for some reason the problem doesn't surface in v1.44.3 it looks like a bug on our side.

@cole-miller
Copy link
Contributor Author

Broken up into commits for easier review :)

@cole-miller
Copy link
Contributor Author

I think there is still a pesky leak of allocations that should be cleaned up by RoleManagementDrain, but when I went to look at the GHA failure for the latest commit, it had turned into a success, hmm.

src/server.h Outdated Show resolved Hide resolved
src/server.c Outdated Show resolved Hide resolved
@freeekanayaka
Copy link
Contributor

Broken up into commits for easier review :)

Much easier to review indeed!

Signed-off-by: Cole Miller <cole.miller@canonical.com>
Signed-off-by: Cole Miller <cole.miller@canonical.com>
Signed-off-by: Cole Miller <cole.miller@canonical.com>
Signed-off-by: Cole Miller <cole.miller@canonical.com>
Signed-off-by: Cole Miller <cole.miller@canonical.com>
Copy link
Contributor

@freeekanayaka freeekanayaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. I left some other comments.

It'd be nice to try to enable role management in the dqlite server, disable it in the dqlite app code and then try to run the test/roles.sh suite and see how it behaves.

It'd be also nice to add to the dqlite suite itself a test similar to go-dqlite's test/roles.sh, either as shell script or C test. That is going to require some work though and could be done separately.

@@ -114,7 +110,6 @@ int dqlite__init(struct dqlite_node *d,
rv = DQLITE_ERROR;
goto err_after_ready_init;
}
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this PR somehow touches this area, and that it probably feels awkward to add similar MacOS-specific calls for the new semaphore being add by this PR.

However it seems that @rabits has already put some meaningful effort to try to support MacOS in dqlite and raft, and it seems there's some community interest in MacOS support.

What about leaving this code in place for now? We don't need to add anything MacOS-specific for the new semaphore right now, but perhaps we'll find folks that can start from the initial work of @rabits and move it forwards. Actually we might have not given @rabits enough support, considering that his raft PR is still open.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah, forgot about this PR. So raft become a big problem for me at a time, and since I don't use dqlite anymore - I can't continue to work on the macos support.

* automatic role management is enabled, servers in a dqlite cluster will
* autonomously (without client intervention) promote and demote each other
* to maintain a specified number of voters and standbys, taking into account
* the health, failure domain, and weight of each server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be useful to move or copy this part of the comment (from "When automatic role management is enabled" onward) to the public docstring of dqlite_node_enable_role_management().

src/roles.c Show resolved Hide resolved
src/roles.c Show resolved Hide resolved
src/roles.c Outdated

struct change_record
{
uint64_t id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the raft_id type here. Fixed-size fields like uint64_t should only be used when serializing or deserializing in-memory structures. This way the compiler can choose the best implementation depending on the hardware platform.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is our code actually prepared to deal with a hypothetical target where unsigned long long and uint64_t are not the same type? It seems to me that if the IDs, failure domains, etc. are required by the wire protocol to be 64 bits, we should just be using uint64_t or a typedef of it everywhere. Using the "wobbly-width" C integer types in that kind of situation is (IMO) either pointless, if the width of the type is effectively fixed (as for unsigned long long), or buggy, if the width can vary across targets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I don't find the argument from optimization persuasive because I can't think of a target where unsigned long long is 128 bits, and even on such a target the gains from using a 128-bit type over a 64-bit type seem to me very unlikely to justify the complexity incurred by mixing unsigned long long and uint64_t when these are not the same width.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we had this conversation before, the principle is that model/language types are independent from protocol/serialization concerns.

At some point we might change the serialization type of a given model/language type without changing the model/language type (thus without breaking the API/ABI). For example we might change the protocol to allow for greater values, without requiring a language/model change.

One recent example of that was the allowed number of statement parameters. The protocol version X only allowed for 256 parameters. But the associated language type was not uint8_t (or whatever Go equivalent). By separating the two concerns we were able to bump the protocol version to version X+1 with a different serialization format/type for the number of the statement parameters without needing to change the language. In the implementation there was a missing check for the 256 limit, but that's fine, it was just a bug, and the fix was not to change the language type from int to uint8_t (or whatever exact Go types), but rather to add the check.

In this specific case the current situation is that the failure domain must be a value between 0 and 2^64, but we don't want to enforce that with a uint64_t type, we want to enforce that with a check in the code (and add documentation if missing). This way we can evolve the wire protocol in the future, if required. We're most probably not going to need that for the failure domain values specifically, but this the general idea.

Is our code actually prepared to deal with a hypothetical target where unsigned long long and uint64_t are not the same type?

That I didn't check. If it's not, it should be fixed with a check on the value, and documented.

Note that boundary checks are just one type of model value checks. In general any expectation we have on a value (e.g. the value should be "odd", or it should be a multiple of 3 or whatever) should be explicitly coded.

It seems to me that if the IDs, failure domains, etc. are required by the wire protocol to be 64 bits, we should just be using uint64_t or a typedef of it everywhere.

As said, the wire protocol is opaque, and can change. What should be more stable is the language/model API.

Using the "wobbly-width" C integer types in that kind of situation is (IMO) either pointless, if the width of the type is effectively fixed (as for unsigned long long), or buggy, if the width can vary across targets.

This is indeed the point, the width of the type is not effectively fixed by design. The current boundary values are [0..2^64] (and as said they should be documented and checked), but those could change in the future without changing the type.

Copy link
Contributor

@freeekanayaka freeekanayaka Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I don't find the argument from optimization persuasive because I can't think of a target where unsigned long long is 128 bits, and even on such a target the gains from using a 128-bit type over a 64-bit type seem to me very unlikely to justify the complexity incurred by mixing unsigned long long and uint64_t when these are not the same width.)

The optimization argument is ancillary and definitely not the main one, the point is more to separate concerns conceptually. As said, failure domains don't need to be 64 bits, and should not be modeled as such. As said, this doesn't matter much in the specific case of failure domains, but it might matter in other cases, and for consistency we apply the same approach everywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a 32-bit world, it would be good practice to use int for your APIs, not int32_t. At that point you might have checks in place to ensure that your values were at most 2^32 (because for example you were serializing them using int32_t). Then you start supporting new 64-bit architectures, and you don't need to change the API (int to int64_t), but you might change the serialization and relax your boundary checks to allow for greater values. User code does not need to change.

Copy link
Contributor Author

@cole-miller cole-miller Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not convinced by this argument, but I will change the PR to match the status quo.

src/roles.c Outdated Show resolved Hide resolved
src/roles.c Outdated

if (status != 0) {
/* XXX */
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure because I still have to familiarize with the new code here, but if status is non-zero, wouldn't we end up in a "stuck" state? Because *polling->count never reaches polling->n_cluster and adjustClusterCb is never called.

In practice, I think the only way for status to be non-zero would be to call uv_cancel() on the work handle, which we currently don't do. However it's something we might eventually want to do (e.g. to improve RolesCancelPendingChanges() or to implement some sort of "hard" dqlite_node_stop() functionality).

It doesn't need to be fixed here, but if what I'm saying is accurate, I would suggest to turn if (status != 0) into assert(status != 0) (assuming that we'd break otherwise), and to add a comment to the assertion describing what I said above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, an assert sounds good.

src/roles.c Show resolved Hide resolved
@cole-miller
Copy link
Contributor Author

cole-miller commented Apr 21, 2023

Review comments addressed. I've left the commit removing #ifdef __APPLE__ in because:

  • the dqlite team is prioritizing code simplification over cross-platform support for the upcoming cycle, and our backlog of work is such that we don't really have the resources to solicit + review external contributions adding support for macOS or other platforms
  • we are not set up to properly maintain macOS support (no CI) (ignore this one, I somehow forgot that GHA supports macOS, so CI support is not an issue)
  • the removed code is quite straightforward and won't be any trouble to add back if someone takes a stab at adding macOS support in the future

Signed-off-by: Cole Miller <cole.miller@canonical.com>
Signed-off-by: Cole Miller <cole.miller@canonical.com>
Signed-off-by: Cole Miller <cole.miller@canonical.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants