Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loadbalancer: selectors consider health first and have configurable fail-open behavior #2787

Merged

Conversation

bryce-anderson
Copy link
Contributor

@bryce-anderson bryce-anderson commented Dec 14, 2023

Motivation:

The health status of a connection is a course grained
indicator of whether a host is likely to be able to serve
traffic and should be the first consideration to selectors
when picking hosts.
A second issue is that it's not obvious what the desired
behavior is when a healthy host cannot be found: it's
going to be user specific whether to fail closed or just
give it a try and see what happens.

Modifications:

  • Switch the RR and P2C selectors to consider health
    first when picking hosts.
  • Add fail open behavior to both round robin and P2C:
    if a healthy host cannot be found we will try the first
    active candidate evaluated.

Results:

  • Health is now considered first. This doesn't change
    much right now but will make L7 health status much
    more useful.
  • Fail open is supported, although off by default.

import static org.mockito.Mockito.mock;
import static org.mockito.Mockito.when;

final class SelectorTestHelpers {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were moved from P2CSelectorTest.java so they can be shared with the RR selector.

if (!host.isUnhealthy()) {
allUnhealthy = false;
break;
if (host.status(false).healthy) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel that this is now harder to read, as a descriptive accessor. wdyt?
either use a descriptive variable ie. WITHOUT_FORCED_CONNECTION or consider improving the method name, ie. statusWithForceConnection(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name didn't bug me, although the parameter in general does. Since we're not sharing the status result with the BaseSelector.selectFromHost method anymore, I thin it's probably better to simple go back to two methods: isHealthy() and isActive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back to boolean methods as discussed, although I change the methods to have more descriptive names. Let me know if that doesn't work for you.

// Only if both hosts are healthy do we consider score.
if (t1Status.healthy && t2Status.healthy) {
// both are healthy. Select based on score, using t1 if equal.
if (t1.score() < t2.score()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current implementation, doesn't require us to call scote() on every selection attempt; decay happens any time we call it, relevant to the time passed. I wonder whether this will always hold true, or we should invoke score() on other flows proactively.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A score() method shouldn't (technically FP math errors can affect this) have external side effects. The score of least-loaded is pretty obvious and in the case of an EWMA, the decay operation is such that calling .score() at time T yield the same value regardless of how many times it was called before that time (presuming another sample hasn't been added in the interim).

Comment on lines +139 to +143
if (t1.canMakeNewConnections()) {
failOpenHost = t1;
} else if (t2.canMakeNewConnections()) {
failOpenHost = t2;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a good case to be made that if we want to fail open we shouldn't even worry about whether it's active or not: that is arguably not a show stopper if you're willing to yo-lo it. Opinions appreciated.

@bryce-anderson bryce-anderson merged commit 6b2b65e into apple:main Jan 3, 2024
15 checks passed
@bryce-anderson bryce-anderson deleted the bl_anderson/ConsiderHealthFirst branch January 3, 2024 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants