loadbalancer: selectors consider health first and have configurable fail-open behavior #2787

bryce-anderson · 2023-12-14T16:50:43Z

Motivation:

The health status of a connection is a course grained
indicator of whether a host is likely to be able to serve
traffic and should be the first consideration to selectors
when picking hosts.
A second issue is that it's not obvious what the desired
behavior is when a healthy host cannot be found: it's
going to be user specific whether to fail closed or just
give it a try and see what happens.

Modifications:

Switch the RR and P2C selectors to consider health
first when picking hosts.
Add fail open behavior to both round robin and P2C:
if a healthy host cannot be found we will try the first
active candidate evaluated.

Results:

Health is now considered first. This doesn't change
much right now but will make L7 health status much
more useful.
Fail open is supported, although off by default.

bryce-anderson · 2023-12-14T16:52:32Z

servicetalk-loadbalancer/src/test/java/io/servicetalk/loadbalancer/SelectorTestHelpers.java

+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.when;
+
+final class SelectorTestHelpers {


These were moved from P2CSelectorTest.java so they can be shared with the RR selector.

tkountis · 2023-12-19T18:18:21Z

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/BaseHostSelector.java

-            if (!host.isUnhealthy()) {
-                allUnhealthy = false;
-                break;
+            if (host.status(false).healthy) {


i feel that this is now harder to read, as a descriptive accessor. wdyt?
either use a descriptive variable ie. WITHOUT_FORCED_CONNECTION or consider improving the method name, ie. statusWithForceConnection(...)

The name didn't bug me, although the parameter in general does. Since we're not sharing the status result with the BaseSelector.selectFromHost method anymore, I thin it's probably better to simple go back to two methods: isHealthy() and isActive.

I went back to boolean methods as discussed, although I change the methods to have more descriptive names. Let me know if that doesn't work for you.

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/BaseHostSelector.java

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/DefaultHost.java

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/Host.java

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/HostSelector.java

tkountis · 2023-12-19T18:44:04Z

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/P2CSelector.java

+            // Only if both hosts are healthy do we consider score.
+            if (t1Status.healthy && t2Status.healthy) {
+                // both are healthy. Select based on score, using t1 if equal.
+                if (t1.score() < t2.score()) {


Current implementation, doesn't require us to call scote() on every selection attempt; decay happens any time we call it, relevant to the time passed. I wonder whether this will always hold true, or we should invoke score() on other flows proactively.

A score() method shouldn't (technically FP math errors can affect this) have external side effects. The score of least-loaded is pretty obvious and in the case of an EWMA, the decay operation is such that calling .score() at time T yield the same value regardless of how many times it was called before that time (presuming another sample hasn't been added in the interim).

Co-authored-by: Thomas Kountis <thomas_kountis@apple.com>

bryce-anderson · 2023-12-19T22:00:31Z

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/P2CSelector.java

+                if (t1.canMakeNewConnections()) {
+                    failOpenHost = t1;
+                } else if (t2.canMakeNewConnections()) {
+                    failOpenHost = t2;
+                }


I think there is a good case to be made that if we want to fail open we shouldn't even worry about whether it's active or not: that is arguably not a show stopper if you're willing to yo-lo it. Opinions appreciated.

bryce-anderson added 6 commits December 12, 2023 17:05

WIP

014e856

Change host to consider if a new connection is required in isHealthy

aa3a3f9

Merge branch 'main' into bl_anderson/ConsiderHealthFirst

ba31018

Some more cleanup

27fb36b

Consolidate some logic and add more tests

b5b9558

return a status enum instead of a bunch of isActive isHealthy etc

34af8ad

bryce-anderson requested review from daschl, mgodave, tkountis and idelpivnitskiy December 14, 2023 16:50

bryce-anderson commented Dec 14, 2023

View reviewed changes

bryce-anderson added 2 commits December 19, 2023 10:14

Merge branch 'main' into bl_anderson/ConsiderHealthFirst

b7c1f18

Rename CLOSED enum to UNHEALTHY_INACTIVE

1a5f768

tkountis reviewed Dec 19, 2023

View reviewed changes

bryce-anderson and others added 4 commits December 19, 2023 12:51

Thomas suggestions

a093ba1

Co-authored-by: Thomas Kountis <thomas_kountis@apple.com>

More Thomas feedback

4fb5cac

Back to boolean methods

06431d6

Dont surface if there are active connections...

3c0a617

bryce-anderson requested a review from tkountis December 19, 2023 21:57

bryce-anderson commented Dec 19, 2023

View reviewed changes

Remove dead method

a212744

tkountis approved these changes Jan 3, 2024

View reviewed changes

bryce-anderson merged commit 6b2b65e into apple:main Jan 3, 2024
15 checks passed

bryce-anderson deleted the bl_anderson/ConsiderHealthFirst branch January 3, 2024 18:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loadbalancer: selectors consider health first and have configurable fail-open behavior #2787

loadbalancer: selectors consider health first and have configurable fail-open behavior #2787

bryce-anderson commented Dec 14, 2023 •

edited

Loading

bryce-anderson Dec 14, 2023

tkountis Dec 19, 2023

bryce-anderson Dec 19, 2023

bryce-anderson Dec 19, 2023

tkountis Dec 19, 2023

bryce-anderson Dec 19, 2023

bryce-anderson Dec 19, 2023

loadbalancer: selectors consider health first and have configurable fail-open behavior #2787

loadbalancer: selectors consider health first and have configurable fail-open behavior #2787

Conversation

bryce-anderson commented Dec 14, 2023 • edited Loading

bryce-anderson Dec 14, 2023

Choose a reason for hiding this comment

tkountis Dec 19, 2023

Choose a reason for hiding this comment

bryce-anderson Dec 19, 2023

Choose a reason for hiding this comment

bryce-anderson Dec 19, 2023

Choose a reason for hiding this comment

tkountis Dec 19, 2023

Choose a reason for hiding this comment

bryce-anderson Dec 19, 2023

Choose a reason for hiding this comment

bryce-anderson Dec 19, 2023

Choose a reason for hiding this comment

bryce-anderson commented Dec 14, 2023 •

edited

Loading