Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quiche: add multi-worker support for QUIC via BPF #9424

Merged
merged 25 commits into from
Feb 7, 2020

Conversation

danzh2010
Copy link
Contributor

@danzh2010 danzh2010 commented Dec 19, 2019

Signed-off-by: Dan Zhang danzh@google.com

Add BPF filter to dispatch QUIC traffic by connection id cross all workers.
Fix integration test to start after all workers starts to listen.

Risk Level: low, not in use yet
Testing: added new integration test with concurrency_ > 1 and with QUIC connection migration.
Part of #2557, #1193

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Contributor Author

/assign @alyssawilk

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@moderation
Copy link
Contributor

I'm intrigued to know if this will work on RHEL 7 / Centos. Red Hat have back ported some BPF to their ancient kernels but advanced networking stuff like Cilium doesn't work. Apparently more advanced BPF capabilities have been included in RHEL 8.

Signed-off-by: Dan Zhang <danzh@google.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is SO cool. :)

Flushing out a few comments from a first pass.

/wait

Comment on lines 128 to 142
{0x80, 0, 0, 0000000000}, // ld len
{0x35, 0, 9, 0x00000009}, // jlt #0x9, packet_too_short
{0x30, 0, 0, 0000000000}, // ldb [0]
{0x54, 0, 0, 0x00000080}, // and #0x80
{0x15, 0, 2, 0000000000}, // jne #0, ietf_long_header
{0x20, 0, 0, 0x00000001}, // ld [1]
{0x05, 0, 0, 0x00000005}, // ja return
{0x80, 0, 0, 0000000000}, // ietf_long_header: ld len
{0x35, 0, 2, 0x0000000e}, // jlt #0xe, packet_too_short
{0x20, 0, 0, 0x00000006}, // ld [6]
{0x05, 0, 0, 0x00000001}, // ja return
{0x20, 0, 0, // packet_too_short: ld rxhash
static_cast<uint32_t>(SKF_AD_OFF + SKF_AD_RXHASH)},
{0x94, 0, 0, concurrency_}, // return: mod #socket_count
{0x16, 0, 0, 0000000000}, // ret a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably you copied this from somewhere? Do we have to have attribution? Either way, can you have a block comment up top that explains what this code does?

2nd question (sorry I don't now much about BPF internals): Is this a generic language that runs on all actual architecture? I assume this is the BPF "machine" language and the kernel will compile this to x86, aarch, etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indirectly translated from Google's internal BPF utils. I added comment about QUIC's connection ID on top.

I don't know how the kernel compiles it, but it is architecture-independent unlike assembly.

// Change to a new port, and connection should still continue.
Network::Address::InstanceConstSharedPtr local_addr =
Network::Test::getCanonicalLoopbackAddress(version_);
std::cerr << "Switch socket and send the rest data\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

del or switch to ENVOY_LOG

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it into comment.

tools/spelling_dictionary.txt Show resolved Hide resolved
@@ -218,5 +225,103 @@ TEST_P(QuicHttpIntegrationTest, TestDelayedConnectionTeardownTimeoutTrigger) {
1);
}

TEST_P(QuicHttpIntegrationTest, MultipleQuicListeners) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do a --runs_per_test=1000 loop for both of these tests both with and without BPF support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2nd test is not supposed to pass without BPF. And I tested the first one with/without BPF support with 1000 runs.

Signed-off-by: Dan Zhang <danzh@google.com>
@mattklein123
Copy link
Member

@danzh2010 the OSX test timed out on quic_http_integration_test which is probably legit.

Per our previous discussion, IMO getting this working on OSX is not a good use of time for you, but it's up to you. Feel free to just disable all of this on OSX if you want. (Though if we do that we should test no BPF on linux as well.)

/wait

Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Contributor Author

danzh2010 commented Dec 26, 2019

@danzh2010 the OSX test timed out on quic_http_integration_test which is probably legit.

Per our previous discussion, IMO getting this working on OSX is not a good use of time for you, but it's up to you. Feel free to just disable all of this on OSX if you want. (Though if we do that we should test no BPF on linux as well.)

/wait

Added NO BPF support test.

Signed-off-by: Dan Zhang <danzh@google.com>
@mattklein123
Copy link
Member

@danzh2010 looks like OSX is still failing... :( (Again I would just disable it but up to you)

/wait

Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Contributor Author

@danzh2010 looks like OSX is still failing... :( (Again I would just disable it but up to you)

I would try to debug a bit more before I disable the test in MacOS.

@mattklein123
Copy link
Member

Sorry needs a master merge and then will take a look. Also looks like OSX is still failing.

/wait

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing stuff. Just a few nits and then LGTM.

/wait

"to socket in Mac OS. QUIC connection is not stable with concurrency > 1");
#else
ENVOY_LOG(warn, "BPF filter is not supported on this platform. QUIC won't support connection "
"migration and NET rebinding.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/NET/NAT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -92,6 +92,14 @@ void EnvoyQuicClientConnection::setUpConnectionSocket() {
}
}

void EnvoyQuicClientConnection::switchConnectionSocket(
Network::ConnectionSocketPtr&& connection_socket) {
auto writer = new EnvoyQuicPacketWriter(*connection_socket);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prefer storing in unique_ptr and then releasing into the QUICHE code, in case intermediate code throws an exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

tools/spelling_dictionary.txt Show resolved Hide resolved
Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So great to see this going public for people to play around with! Mostly nits on my end and I owe the tests a pass tomorrow.

@@ -21,7 +21,7 @@ class ActiveUdpListenerConfigFactory {
* Create an ActiveUdpListenerFactory object according to given message.
*/
virtual Network::ActiveUdpListenerFactoryPtr
createActiveUdpListenerFactory(const Protobuf::Message& message) PURE;
createActiveUdpListenerFactory(const Protobuf::Message& message, uint32_t concurrency) PURE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind documenting arguments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -1,5 +1,12 @@
#include "extensions/quic_listeners/quiche/active_quic_listener.h"

#if defined(__linux__)
#include <linux/filter.h>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nitty but mind removing the trailing line here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if (!ok) {
ENVOY_LOG(warn, "Fail to apply socket options to socket {} on listener {} after binding",
listen_socket_.ioHandle().fd(), listener_config.name());
throw EnvoyException("Fail to apply socket options.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fail -> Failed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Network::ListenerConfig& config) {
std::unique_ptr<Network::Socket::Options> options = std::make_unique<Network::Socket::Options>();
#if defined(SO_ATTACH_REUSEPORT_CBPF) && defined(__linux__)
// This BPF filter reads the 1st word of QUIC connection id in the UDP payload and mod it by the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mod -> mods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

#if defined(SO_ATTACH_REUSEPORT_CBPF) && defined(__linux__)
// This BPF filter reads the 1st word of QUIC connection id in the UDP payload and mod it by the
// number of workers to get the socket index in the SO_REUSEPORT socket groups. QUIC packets
// should be at least 9 bytes, with the 1st byte indicates one of the below QUIC packet headers:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indicates -> indicating

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// 1) IETF QUIC long header: most significant bit is 1. The connection id starts from the 7th
// byte.
// 2) IETF QUIC short header: most significant bit is 0. The connection id starts from 2nd
// bytes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bytes -> byte here and below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// 3) Google QUIC header: most significant bit is 0. The connection id starts from 2nd
// bytes.
// Any packet that doesn't belong to any of the three packet header types are dispatched
// based on 5-tuple source/destination addresses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because they will not have this filter applied, or because we do payload inspection and best effort guess?
(I really should know this one but if even I don't we should comment =P)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is according to the BPF code: if the payload doesn't seems to be a valid QUIC packet, dispatch it using the network 5 tuple

"to socket in Mac OS. QUIC connection is not stable with concurrency > 1");
#else
ENVOY_LOG(warn, "BPF filter is not supported on this platform. QUIC won't support connection "
"migration and NET rebinding.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAT port rebinding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

#else
if (concurrency_ > 1) {
#ifdef __APPLE__
// TODO(#8794) Figure out how SO_REUSEPORT behave in Mac OS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your call but I'd say we just list it somewhere as not supported until/if someone cares, rather than TODOing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -40,6 +40,8 @@ class EnvoyQuicClientConnection : public EnvoyQuicConnection, public Network::Ud
// Register file event and apply socket options.
void setUpConnectionSocket();

void switchConnectionSocket(Network::ConnectionSocketPtr&& connection_socket);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

TEST_P(QuicHttpIntegrationTest, MultipleQuicListenersNoBPF) {
#ifdef __APPLE__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idle thought, we do this enough I wonder if it's worth a macro like the #ifndef ENVOY_DISABLE_DEPRECATED_FEATURES to just have
TEST_P(QuicHttpIntegrationTest, DISABLE_FOR_APPLE(MultipleQuicListenersNoBPF))

pros: it saves test setup and teardown on the builds we don't want. cons: the macros don't stack so we might have to one-off any which used both. Optional - add it in if you think it's a good idea :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be a really helpful macro, but since it's not widely used in this PR, I would prefer to defer it to a stand alone change.

}

TEST_P(QuicHttpIntegrationTest, ConnectionMigration) {
#if defined(SO_ATTACH_REUSEPORT_CBPF) && defined(__linux__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might as well put the TEST_P {} in this block too, so we don't create and run an empty test if it's not testable, yeah?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point! done

Network::Address::InstanceConstSharedPtr local_addr =
Network::Test::getCanonicalLoopbackAddress(version_);
quic_connection_->switchConnectionSocket(
createConnectionSocket(server_addr_, local_addr, nullptr));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any stats we can check to make sure this actually migrated, and that we didn't somehow end up with a testing no-op?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not stats on server side yet. But I added logic to check client port has been changed.

@stale
Copy link

stale bot commented Jan 15, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Jan 15, 2020
@stale
Copy link

stale bot commented Jan 22, 2020

This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot closed this Jan 22, 2020
@danzh2010
Copy link
Contributor Author

How to reopen this PR? I just come back from vacation.

@alyssawilk alyssawilk reopened this Feb 3, 2020
@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Feb 3, 2020
@alyssawilk
Copy link
Contributor

opened!

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Contributor Author

Sync'ed with master and addressed all comments. PTAL!

mattklein123
mattklein123 previously approved these changes Feb 5, 2020
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. I suggest we ship/iterate. Awesome work! Will defer to @alyssawilk for further review/merge.

Signed-off-by: Dan Zhang <danzh@google.com>
@mattklein123 mattklein123 merged commit e176b30 into envoyproxy:master Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants