-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quiche: add multi-worker support for QUIC via BPF #9424
Conversation
Signed-off-by: Dan Zhang <danzh@google.com>
/assign @alyssawilk |
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
I'm intrigued to know if this will work on RHEL 7 / Centos. Red Hat have back ported some BPF to their ancient kernels but advanced networking stuff like Cilium doesn't work. Apparently more advanced BPF capabilities have been included in RHEL 8. |
Signed-off-by: Dan Zhang <danzh@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is SO cool. :)
Flushing out a few comments from a first pass.
/wait
source/extensions/quic_listeners/quiche/active_quic_listener.cc
Outdated
Show resolved
Hide resolved
{0x80, 0, 0, 0000000000}, // ld len | ||
{0x35, 0, 9, 0x00000009}, // jlt #0x9, packet_too_short | ||
{0x30, 0, 0, 0000000000}, // ldb [0] | ||
{0x54, 0, 0, 0x00000080}, // and #0x80 | ||
{0x15, 0, 2, 0000000000}, // jne #0, ietf_long_header | ||
{0x20, 0, 0, 0x00000001}, // ld [1] | ||
{0x05, 0, 0, 0x00000005}, // ja return | ||
{0x80, 0, 0, 0000000000}, // ietf_long_header: ld len | ||
{0x35, 0, 2, 0x0000000e}, // jlt #0xe, packet_too_short | ||
{0x20, 0, 0, 0x00000006}, // ld [6] | ||
{0x05, 0, 0, 0x00000001}, // ja return | ||
{0x20, 0, 0, // packet_too_short: ld rxhash | ||
static_cast<uint32_t>(SKF_AD_OFF + SKF_AD_RXHASH)}, | ||
{0x94, 0, 0, concurrency_}, // return: mod #socket_count | ||
{0x16, 0, 0, 0000000000}, // ret a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably you copied this from somewhere? Do we have to have attribution? Either way, can you have a block comment up top that explains what this code does?
2nd question (sorry I don't now much about BPF internals): Is this a generic language that runs on all actual architecture? I assume this is the BPF "machine" language and the kernel will compile this to x86, aarch, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indirectly translated from Google's internal BPF utils. I added comment about QUIC's connection ID on top.
I don't know how the kernel compiles it, but it is architecture-independent unlike assembly.
source/extensions/quic_listeners/quiche/active_quic_listener.cc
Outdated
Show resolved
Hide resolved
// Change to a new port, and connection should still continue. | ||
Network::Address::InstanceConstSharedPtr local_addr = | ||
Network::Test::getCanonicalLoopbackAddress(version_); | ||
std::cerr << "Switch socket and send the rest data\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
del or switch to ENVOY_LOG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it into comment.
@@ -218,5 +225,103 @@ TEST_P(QuicHttpIntegrationTest, TestDelayedConnectionTeardownTimeoutTrigger) { | |||
1); | |||
} | |||
|
|||
TEST_P(QuicHttpIntegrationTest, MultipleQuicListeners) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do a --runs_per_test=1000
loop for both of these tests both with and without BPF support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 2nd test is not supposed to pass without BPF. And I tested the first one with/without BPF support with 1000 runs.
Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010 the OSX test timed out on quic_http_integration_test which is probably legit. Per our previous discussion, IMO getting this working on OSX is not a good use of time for you, but it's up to you. Feel free to just disable all of this on OSX if you want. (Though if we do that we should test no BPF on linux as well.) /wait |
Signed-off-by: Dan Zhang <danzh@google.com>
Added NO BPF support test. |
Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010 looks like OSX is still failing... :( (Again I would just disable it but up to you) /wait |
Signed-off-by: Dan Zhang <danzh@google.com>
I would try to debug a bit more before I disable the test in MacOS. |
Sorry needs a master merge and then will take a look. Also looks like OSX is still failing. /wait |
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing stuff. Just a few nits and then LGTM.
/wait
"to socket in Mac OS. QUIC connection is not stable with concurrency > 1"); | ||
#else | ||
ENVOY_LOG(warn, "BPF filter is not supported on this platform. QUIC won't support connection " | ||
"migration and NET rebinding."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/NET/NAT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -92,6 +92,14 @@ void EnvoyQuicClientConnection::setUpConnectionSocket() { | |||
} | |||
} | |||
|
|||
void EnvoyQuicClientConnection::switchConnectionSocket( | |||
Network::ConnectionSocketPtr&& connection_socket) { | |||
auto writer = new EnvoyQuicPacketWriter(*connection_socket); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: prefer storing in unique_ptr and then releasing into the QUICHE code, in case intermediate code throws an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So great to see this going public for people to play around with! Mostly nits on my end and I owe the tests a pass tomorrow.
@@ -21,7 +21,7 @@ class ActiveUdpListenerConfigFactory { | |||
* Create an ActiveUdpListenerFactory object according to given message. | |||
*/ | |||
virtual Network::ActiveUdpListenerFactoryPtr | |||
createActiveUdpListenerFactory(const Protobuf::Message& message) PURE; | |||
createActiveUdpListenerFactory(const Protobuf::Message& message, uint32_t concurrency) PURE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind documenting arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -1,5 +1,12 @@ | |||
#include "extensions/quic_listeners/quiche/active_quic_listener.h" | |||
|
|||
#if defined(__linux__) | |||
#include <linux/filter.h> | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nitty but mind removing the trailing line here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if (!ok) { | ||
ENVOY_LOG(warn, "Fail to apply socket options to socket {} on listener {} after binding", | ||
listen_socket_.ioHandle().fd(), listener_config.name()); | ||
throw EnvoyException("Fail to apply socket options."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fail -> Failed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Network::ListenerConfig& config) { | ||
std::unique_ptr<Network::Socket::Options> options = std::make_unique<Network::Socket::Options>(); | ||
#if defined(SO_ATTACH_REUSEPORT_CBPF) && defined(__linux__) | ||
// This BPF filter reads the 1st word of QUIC connection id in the UDP payload and mod it by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mod -> mods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
#if defined(SO_ATTACH_REUSEPORT_CBPF) && defined(__linux__) | ||
// This BPF filter reads the 1st word of QUIC connection id in the UDP payload and mod it by the | ||
// number of workers to get the socket index in the SO_REUSEPORT socket groups. QUIC packets | ||
// should be at least 9 bytes, with the 1st byte indicates one of the below QUIC packet headers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indicates -> indicating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// 1) IETF QUIC long header: most significant bit is 1. The connection id starts from the 7th | ||
// byte. | ||
// 2) IETF QUIC short header: most significant bit is 0. The connection id starts from 2nd | ||
// bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bytes -> byte here and below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// 3) Google QUIC header: most significant bit is 0. The connection id starts from 2nd | ||
// bytes. | ||
// Any packet that doesn't belong to any of the three packet header types are dispatched | ||
// based on 5-tuple source/destination addresses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because they will not have this filter applied, or because we do payload inspection and best effort guess?
(I really should know this one but if even I don't we should comment =P)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is according to the BPF code: if the payload doesn't seems to be a valid QUIC packet, dispatch it using the network 5 tuple
"to socket in Mac OS. QUIC connection is not stable with concurrency > 1"); | ||
#else | ||
ENVOY_LOG(warn, "BPF filter is not supported on this platform. QUIC won't support connection " | ||
"migration and NET rebinding."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NAT port rebinding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
#else | ||
if (concurrency_ > 1) { | ||
#ifdef __APPLE__ | ||
// TODO(#8794) Figure out how SO_REUSEPORT behave in Mac OS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your call but I'd say we just list it somewhere as not supported until/if someone cares, rather than TODOing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -40,6 +40,8 @@ class EnvoyQuicClientConnection : public EnvoyQuicConnection, public Network::Ud | |||
// Register file event and apply socket options. | |||
void setUpConnectionSocket(); | |||
|
|||
void switchConnectionSocket(Network::ConnectionSocketPtr&& connection_socket); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
TEST_P(QuicHttpIntegrationTest, MultipleQuicListenersNoBPF) { | ||
#ifdef __APPLE__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idle thought, we do this enough I wonder if it's worth a macro like the #ifndef ENVOY_DISABLE_DEPRECATED_FEATURES to just have
TEST_P(QuicHttpIntegrationTest, DISABLE_FOR_APPLE(MultipleQuicListenersNoBPF))
pros: it saves test setup and teardown on the builds we don't want. cons: the macros don't stack so we might have to one-off any which used both. Optional - add it in if you think it's a good idea :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll be a really helpful macro, but since it's not widely used in this PR, I would prefer to defer it to a stand alone change.
} | ||
|
||
TEST_P(QuicHttpIntegrationTest, ConnectionMigration) { | ||
#if defined(SO_ATTACH_REUSEPORT_CBPF) && defined(__linux__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might as well put the TEST_P {} in this block too, so we don't create and run an empty test if it's not testable, yeah?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point! done
Network::Address::InstanceConstSharedPtr local_addr = | ||
Network::Test::getCanonicalLoopbackAddress(version_); | ||
quic_connection_->switchConnectionSocket( | ||
createConnectionSocket(server_addr_, local_addr, nullptr)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any stats we can check to make sure this actually migrated, and that we didn't somehow end up with a testing no-op?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not stats on server side yet. But I added logic to check client port has been changed.
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
How to reopen this PR? I just come back from vacation. |
opened! |
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Sync'ed with master and addressed all comments. PTAL! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks. I suggest we ship/iterate. Awesome work! Will defer to @alyssawilk for further review/merge.
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang danzh@google.com
Add BPF filter to dispatch QUIC traffic by connection id cross all workers.
Fix integration test to start after all workers starts to listen.
Risk Level: low, not in use yet
Testing: added new integration test with concurrency_ > 1 and with QUIC connection migration.
Part of #2557, #1193