-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loadbalancer: selectors consider health first and have configurable fail-open behavior #2787
loadbalancer: selectors consider health first and have configurable fail-open behavior #2787
Conversation
import static org.mockito.Mockito.mock; | ||
import static org.mockito.Mockito.when; | ||
|
||
final class SelectorTestHelpers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were moved from P2CSelectorTest.java so they can be shared with the RR selector.
if (!host.isUnhealthy()) { | ||
allUnhealthy = false; | ||
break; | ||
if (host.status(false).healthy) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i feel that this is now harder to read, as a descriptive accessor. wdyt?
either use a descriptive variable ie. WITHOUT_FORCED_CONNECTION
or consider improving the method name, ie. statusWithForceConnection(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name didn't bug me, although the parameter in general does. Since we're not sharing the status result with the BaseSelector.selectFromHost
method anymore, I thin it's probably better to simple go back to two methods: isHealthy()
and isActive
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went back to boolean methods as discussed, although I change the methods to have more descriptive names. Let me know if that doesn't work for you.
servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/BaseHostSelector.java
Outdated
Show resolved
Hide resolved
servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/DefaultHost.java
Outdated
Show resolved
Hide resolved
servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/DefaultHost.java
Outdated
Show resolved
Hide resolved
servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/Host.java
Outdated
Show resolved
Hide resolved
servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/Host.java
Outdated
Show resolved
Hide resolved
servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/HostSelector.java
Outdated
Show resolved
Hide resolved
servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/HostSelector.java
Outdated
Show resolved
Hide resolved
// Only if both hosts are healthy do we consider score. | ||
if (t1Status.healthy && t2Status.healthy) { | ||
// both are healthy. Select based on score, using t1 if equal. | ||
if (t1.score() < t2.score()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current implementation, doesn't require us to call scote()
on every selection attempt; decay happens any time we call it, relevant to the time passed. I wonder whether this will always hold true, or we should invoke score()
on other flows proactively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A score()
method shouldn't (technically FP math errors can affect this) have external side effects. The score of least-loaded is pretty obvious and in the case of an EWMA, the decay operation is such that calling .score()
at time T
yield the same value regardless of how many times it was called before that time (presuming another sample hasn't been added in the interim).
Co-authored-by: Thomas Kountis <thomas_kountis@apple.com>
if (t1.canMakeNewConnections()) { | ||
failOpenHost = t1; | ||
} else if (t2.canMakeNewConnections()) { | ||
failOpenHost = t2; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a good case to be made that if we want to fail open we shouldn't even worry about whether it's active or not: that is arguably not a show stopper if you're willing to yo-lo it. Opinions appreciated.
Motivation:
The health status of a connection is a course grained
indicator of whether a host is likely to be able to serve
traffic and should be the first consideration to selectors
when picking hosts.
A second issue is that it's not obvious what the desired
behavior is when a healthy host cannot be found: it's
going to be user specific whether to fail closed or just
give it a try and see what happens.
Modifications:
first when picking hosts.
if a healthy host cannot be found we will try the first
active candidate evaluated.
Results:
much right now but will make L7 health status much
more useful.