Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for terminating QUIC #2557

Closed
alyssawilk opened this issue Feb 8, 2018 · 47 comments
Closed

Add support for terminating QUIC #2557

alyssawilk opened this issue Feb 8, 2018 · 47 comments
Assignees
Labels
area/quic enhancement Feature requests. Not bugs or questions. no stalebot Disables stalebot from closing an issue
Milestone

Comments

@alyssawilk
Copy link
Contributor

We’ve had a few off-line discussions on how to maybe go about this with @mattklein123 but we’re now close enough it’s worth filing a tracking issue for it :-)

Here’s our plan A - any Envoy devs interested in QUIC support are encouraged to offer suggestions/improvements

Milestone 1 is to hack together “something which builds” using the existing Google QUIC code*, because as it turns out it takes years to debug all the weird corner cases in congestion control and crypto and it’s likely easier to copy existing code than write it all from scratch. That code base is “supposed to” build fairly cleanly as long as one implements all the things in quic/platform/impl but almost certainly doesn’t. This will be done in in @juvexp’s Envoy branch to allow rapid iteration - any interested parties are welcome to follow along and/or contribute there. Once the QUIC code builds and QUIC unit tests pass, it’s on to Milestone 2.

Milestone 2 is to get QUIC “working” with Envoy, where we have working integration tests. This may not include clean Envoy API use. For example, the first pass might treat QUIC as one logical Connection rather than one-Connection-per-stream Milestone 2 will likely involve landing some code in Envoy (things like UDP listeners if @cmluciano hasn’t beat us to it)

Milestone 3 is having QUIC fully landed in Envoy, with proper API wrappers for all the various codec / connection / crypto functionality QUIC supports. This will involve slowly cleaning up anything still only in @juvexp’s Envoy branch, along with having a story for gracefully handling upstream/downstream updates and contributions.

*currently visibile at https://github.com/chromium/chromium/tree/master/net/quic. Plot twist, while we’re hacking around in a custom branch the Google devs are likely to use “upstream” QUIC code, since then they can make changes to code structure and push them directly to Envoy rather than pushing from google3-upstream to Chrome to Envoy which will really slow development. Long term code syncing is a big glaring TBD as we need a library non-Googlers can contribute to, but also want to cheaply and easily get the latest security fixes and IETF spec updates from the dedicated Google dev team working on QUIC.

@alyssawilk alyssawilk added the enhancement Feature requests. Not bugs or questions. label Feb 8, 2018
@ggreenway
Copy link
Contributor

How close of a match is the quic threading model to envoy's?

Any known issues or expected pain points integrating google-quic into envoy?

@mattklein123
Copy link
Member

@alyssawilk Thanks for the great summary of the plan! Small ask: we are already tracking QUIC here: #1193. Can we consolidate issues?

@alyssawilk
Copy link
Contributor Author

@ggreenway Threading is fine - the was written for GFE (Google's HTTP proxy) and Chrome, which have compatible threading models with Envoy.

I think I covered the main pain points above. Pain point one being that the code isn't architected well for Envoy APIs (which we'll fix). And pain point two being "external deps" - while we tried to abstract away non-QUIC-core concepts like "how do you implement your IP Address" and "how are alarms managed" into the platforms directory, we really only abstracted things that GFE/Chrome didn't have in common. Given they're both predominantly Google code, I suspect we'll run into many things we haven't yet abstracted enough (like logging). Once we get it building I think the APIs will come fairly quickly, and the really tricky parts of the code (crypto and congestion) shouldn't be a problem.

Other than that, well almost all the QUIC code is open sourced, but not all. I think our UDP proxying is GFE-specific enough we may want to reimplement from scratch, though we may have some useful utils and definitely will have input on how things can go wrong :-P. We'll also need to open source a bunch of the perf optimizations the current toy server didn't need, like reading from rx_ring with berkley packet filters and writing to raw sockets so perf won't suck for bandwidth-heavy users.

@alyssawilk
Copy link
Contributor Author

@mattklein123 your call. I liken the two to "TCP proxying" and "HTTP termination" so I think they're different enough to warrant different tracking bugs. I also think they can be done substantially in parallel - I'm hoping @cmluciano will actually pick up UDP proxying while mpw@google/@juvexp will probably do the bulk of this one.

@mattklein123
Copy link
Member

@alyssawilk if you think the issues are different no problem we can keep both.

@juvexp
Copy link

juvexp commented Jun 7, 2018

Sorry for the lack of updates. I worked on this only a little bit since the bug was filed:

  1. Created a personal fork: https://github.com/juvexp/envoy
  2. Wrote a script to export QUIC code from Google to my personal fork.
  3. Added the implementation for a few QUIC platform APIs.

I plan to get back to the QUIC platform implementation in the next week, hopefully I can get it done by the end of this quarter. I'll send another update after that.

@stale
Copy link

stale bot commented Jul 7, 2018

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Jul 7, 2018
@mattklein123 mattklein123 added the help wanted Needs help! label Jul 7, 2018
@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Jul 7, 2018
@ggreenway
Copy link
Contributor

@juvexp Any updates on your progress?

@rpaulo
Copy link

rpaulo commented Jul 17, 2018

Will this work involve supporting Google QUIC or IETF QUIC? Since QUIC is being standardized by the IETF, I think it's more beneficial to spend time on IETF QUIC.

@bmetzdorf
Copy link
Contributor

I agree, we should be doing IETF QUIC.

@mattklein123
Copy link
Member

The plan is to support both Google QUIC and IETF QUIC. Google QUIC is the fastest path to something that is production ready while continuing to work towards the IETF standard.

@vinothchandar
Copy link

Hi.. is someone working on bringing QUIC support to envoy? I saw a "help wanted" tag on #1193 as well.. Just want to understand the state of things.

@alyssawilk
Copy link
Contributor Author

AFIK the Google-QUIC team that had picked this up got diverted from working on it, but are hoping to get back to it and land QUIC-Envoy integration in Q1. That said I wouldn't say that's anywhere near guaranteed - if you have interest or cycles in helping out help would definitely be appreciated :-)

@vinothchandar
Copy link

Thanks for the prompt response.. Is there any POC from Google's side that can give us more context? For e.g, if this needs changes to Chromium. (we contributed a small reverse proxy to chromium quic and had to move around some classes to make way)

Definitely love to contribute, once we flesh out any dependencies :)

@alyssawilk
Copy link
Contributor Author

Well I'm the one who generally pings the QUIC team when there's questions but @ianswett @RyanatGoogle who are actually deciding who does what.

For contributions, I think we're going to want to reuse the google-QUIC code where we can for congestion control and crypto because (there's so many subtle things to get wrong and it'd be nice to not repeat mistakes!) but AFIK no one ever got around to open sourcing the internal proxy implementation, so if you're interested in doing QUIC work that'd be a great place to start. I'd be happy to chat about how that work might go when/if you have cycles as I have quite a bit of context.

@RyanTheOptimist
Copy link
Contributor

As @alyssawilk alluded to, we're currently working to take Google's QUIC code which is shared between Chromium and our internal proxy and export it to a stand-alone repository where it should be easier to consume for other projects. This repository will be DEPS'd into chromium (meaning we'll pull the contents of this repo directly instead of having duplicate or slightly modified files in Chromium). Consumer of this stand alone repo (of which Chromium will be one) will need to provide their own "platform impl" for the various platform specific QUIC dependencies. https://cs.chromium.org/chromium/src/net/third_party/quic/platform/impl/ This includes things like socket and clock abstractions, various compiler configurations, logging, etc. It will not require consumers to pull in all sorts of chromium dependencies.

Hope that helps...

@conqerAtapple
Copy link
Contributor

conqerAtapple commented Oct 15, 2018

Has anyone looked at https://github.com/ngtcp2/ngtcp2 ? Their event loop model fits envoy so might be a natural fit.

@vinothchandar
Copy link

@alyssawilk Thanks. agree on reusing chromium impl as much as possible for the same reasons. On flip side, something like ngtcp2 is very promising but not sure how production hardened it is, at this point. Given envoy is used by so many, it may be a lil premature to pull it into envoy and certify support.

@RyanatGoogle The standalone repo makes total sense.. I remember you moved a lot of code already for our chromium proxy work. Is there more to be done there? In other words, is it just moving it to a new repo or there are more refactoring work that needs to be carefully orchestrated by google engineers to ensure the internal proxy continues to work? If latter, then it may not be easy for us (outside of google) to get started on this work? Thoughts?

@conqerAtapple
Copy link
Contributor

How do you compare "production ready" between refactored(untested) code vs something like ngtcp2?

@RyanTheOptimist
Copy link
Contributor

@vinothchandar The primary work remaining is to remove a whole pile of minor diffs between the internal and external versions of the code which have crept in over the years. This is a raft of CLs like:

https://chromium-review.googlesource.com/c/chromium/src/+/1273814

Once that's complete it's mostly a simple matter of just moving the files to a new repo. We do not believe there is any careful refactoring to be done as part of this effort like there was when trying to add reverse proxy support to the toy server.

@conqer I'm not sure if your question was about the Google QUIC repo or something else. If it was about the Google QUIC repo, that will contain the core QUIC implementation which is used for Chrome (and various Google apps which use Chrome's networking library) as well as our internal proxy. So this is very much tested production code.

@mattklein123
Copy link
Member

@conqer @vinothchandar I'm out for two weeks but sorting out a firm plan for QUIC is high on my priority list when I get back. If you would to be involved please send me email at mklein@lyft.com so that we can coordinate and we can learn how you might want to contribute.

From my perspective, both https://github.com/ngtcp2/ngtcp2 and https://github.com/h2o/quicly are on the table is viable options (nothing has been firmly decided yet). Everything will be hidden behind an interface no matter what, so Google's code can be swapped in if it can't be extracted into a library in the time that we want to get this done.

As a pre-requisite to this work I'm probably also going to start helping out on basic UDP proxy support since no progress has been made on that. (#492)

@mattklein123 mattklein123 self-assigned this Oct 16, 2018
@conqerAtapple
Copy link
Contributor

@mattklein123 sent you email.

@vinothchandar
Copy link

@RyanatGoogle that sounds promising.. We can try our hand at this then, looks like. If we atleast have a PoC version with Chromium working, then may be as bad to redo it with the new repo/independent lib. Correct me if I am missing something.

@mattklein123 yes will send you an email to coordinate. It'd be awesome if you can also get @alyssawilk looped in. I have reasonable handle on the chromium code base for this, but total newbie to envoy. So appreciate all your help.

alyssawilk pushed a commit that referenced this issue Apr 6, 2021
Commit Message: allow HCM to config Http3Options and use it with other HCM configs, i.e. max_request_headers_kb and headers_with_underscores_action, to setup QuicHttpServerConnectionImpl. And support these configurations in QUIC. Currently the only Http3Options configuration is override_stream_error_on_invalid_http_message.

Additional Description: added Http3 codec stats. Pass it along with Http3Options and other HCM configs.

Risk Level: low
Testing: enable related tests in quic_protocol_integration_test
Docs Changes: updated docs/root/configuration/http/http_conn_man/response_code_details.rst

Part of #12930 #2557

Signed-off-by: Dan Zhang <danzh@google.com>
alyssawilk pushed a commit that referenced this issue Apr 29, 2021
Commit Message: use max header size in HCM config to initialize quic session.
Additional Description:
change QuicSession::Initialize() and filter chain creation order to set max header list size for each session before Initialize(). Move Initialize() of Http3 connection from CodecClient to CodecClientProd.
Added CodecClient::connect() interface.
Change EnvoyQuicConnection not to directly inherit from QuicConnection. And renamed it.
Fix a use-after-free bug in FakeUpstream which causes ASAN failure.

Risk Level: low
Testing: more protocol H3 tests

Part of #2557 #12930 #14829

Signed-off-by: Dan Zhang <danzh@google.com>
alyssawilk pushed a commit that referenced this issue Apr 29, 2021
#16128)

Commit Message: use posted callback to block/unblock quic stream in EnvoyQuicStream::readDisable() with weak_ptr to stream to handle stream life time issue.
Additional Description: currently if readDisabled() is called in decodeHeaders|Trailers(), the stream will be blocked right away which breaks the assumption QUICHE has that a stream shouldn't be blocked during OnInitialHeadersComplete() and OnTrailingHeadersComplete().
This change makes the stream state change completely outside of QUICHE call stack. (The unblocking is already outside of QUICHE call stack in existing implementation.) Also simplify the blockage state change logic.

Risk Level: low
Testing: added more unit tests and enabled Http2UpstreamIntegrationTest::ManyLargeSimultaneousRequestWithBufferLimits which was flaky.

Part of #2557 #14829

Signed-off-by: Dan Zhang <danzh@google.com>
Co-authored-by: Dan Zhang <danzh@google.com>
alyssawilk pushed a commit that referenced this issue May 3, 2021
Commit Message: Add initial stream and connection level flow control windows to envoy.config.core.v3.QuicProtocolOptions which is be used in QUIC listener config and Http3 upstream cluster config.

Risk Level: low
Testing: re-enable more Http3 downstream protocol test.

Part of #2557 #12930 #14829

Signed-off-by: Dan Zhang <danzh@google.com>
Co-authored-by: Dan Zhang <danzh@google.com>
gokulnair pushed a commit to gokulnair/envoy that referenced this issue May 5, 2021
Commit Message: Add initial stream and connection level flow control windows to envoy.config.core.v3.QuicProtocolOptions which is be used in QUIC listener config and Http3 upstream cluster config.

Risk Level: low
Testing: re-enable more Http3 downstream protocol test.

Part of envoyproxy#2557 envoyproxy#12930 envoyproxy#14829

Signed-off-by: Dan Zhang <danzh@google.com>
Co-authored-by: Dan Zhang <danzh@google.com>
Signed-off-by: Gokul Nair <gnair@twitter.com>
gokulnair pushed a commit to gokulnair/envoy that referenced this issue May 6, 2021
Commit Message: use max header size in HCM config to initialize quic session.
Additional Description:
change QuicSession::Initialize() and filter chain creation order to set max header list size for each session before Initialize(). Move Initialize() of Http3 connection from CodecClient to CodecClientProd.
Added CodecClient::connect() interface.
Change EnvoyQuicConnection not to directly inherit from QuicConnection. And renamed it.
Fix a use-after-free bug in FakeUpstream which causes ASAN failure.

Risk Level: low
Testing: more protocol H3 tests

Part of envoyproxy#2557 envoyproxy#12930 envoyproxy#14829

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Gokul Nair <gnair@twitter.com>
gokulnair pushed a commit to gokulnair/envoy that referenced this issue May 6, 2021
envoyproxy#16128)

Commit Message: use posted callback to block/unblock quic stream in EnvoyQuicStream::readDisable() with weak_ptr to stream to handle stream life time issue.
Additional Description: currently if readDisabled() is called in decodeHeaders|Trailers(), the stream will be blocked right away which breaks the assumption QUICHE has that a stream shouldn't be blocked during OnInitialHeadersComplete() and OnTrailingHeadersComplete().
This change makes the stream state change completely outside of QUICHE call stack. (The unblocking is already outside of QUICHE call stack in existing implementation.) Also simplify the blockage state change logic.

Risk Level: low
Testing: added more unit tests and enabled Http2UpstreamIntegrationTest::ManyLargeSimultaneousRequestWithBufferLimits which was flaky.

Part of envoyproxy#2557 envoyproxy#14829

Signed-off-by: Dan Zhang <danzh@google.com>
Co-authored-by: Dan Zhang <danzh@google.com>
Signed-off-by: Gokul Nair <gnair@twitter.com>
gokulnair pushed a commit to gokulnair/envoy that referenced this issue May 6, 2021
Commit Message: Add initial stream and connection level flow control windows to envoy.config.core.v3.QuicProtocolOptions which is be used in QUIC listener config and Http3 upstream cluster config.

Risk Level: low
Testing: re-enable more Http3 downstream protocol test.

Part of envoyproxy#2557 envoyproxy#12930 envoyproxy#14829

Signed-off-by: Dan Zhang <danzh@google.com>
Co-authored-by: Dan Zhang <danzh@google.com>
Signed-off-by: Gokul Nair <gnair@twitter.com>
gokulnair pushed a commit to gokulnair/envoy that referenced this issue May 6, 2021
Commit Message: use max header size in HCM config to initialize quic session.
Additional Description:
change QuicSession::Initialize() and filter chain creation order to set max header list size for each session before Initialize(). Move Initialize() of Http3 connection from CodecClient to CodecClientProd.
Added CodecClient::connect() interface.
Change EnvoyQuicConnection not to directly inherit from QuicConnection. And renamed it.
Fix a use-after-free bug in FakeUpstream which causes ASAN failure.

Risk Level: low
Testing: more protocol H3 tests

Part of envoyproxy#2557 envoyproxy#12930 envoyproxy#14829

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Gokul Nair <gnair@twitter.com>
gokulnair pushed a commit to gokulnair/envoy that referenced this issue May 6, 2021
envoyproxy#16128)

Commit Message: use posted callback to block/unblock quic stream in EnvoyQuicStream::readDisable() with weak_ptr to stream to handle stream life time issue.
Additional Description: currently if readDisabled() is called in decodeHeaders|Trailers(), the stream will be blocked right away which breaks the assumption QUICHE has that a stream shouldn't be blocked during OnInitialHeadersComplete() and OnTrailingHeadersComplete().
This change makes the stream state change completely outside of QUICHE call stack. (The unblocking is already outside of QUICHE call stack in existing implementation.) Also simplify the blockage state change logic.

Risk Level: low
Testing: added more unit tests and enabled Http2UpstreamIntegrationTest::ManyLargeSimultaneousRequestWithBufferLimits which was flaky.

Part of envoyproxy#2557 envoyproxy#14829

Signed-off-by: Dan Zhang <danzh@google.com>
Co-authored-by: Dan Zhang <danzh@google.com>
Signed-off-by: Gokul Nair <gnair@twitter.com>
gokulnair pushed a commit to gokulnair/envoy that referenced this issue May 6, 2021
Commit Message: Add initial stream and connection level flow control windows to envoy.config.core.v3.QuicProtocolOptions which is be used in QUIC listener config and Http3 upstream cluster config.

Risk Level: low
Testing: re-enable more Http3 downstream protocol test.

Part of envoyproxy#2557 envoyproxy#12930 envoyproxy#14829

Signed-off-by: Dan Zhang <danzh@google.com>
Co-authored-by: Dan Zhang <danzh@google.com>
Signed-off-by: Gokul Nair <gnair@twitter.com>
alyssawilk added a commit that referenced this issue May 7, 2021
Unhiding HTTP/3 upstream and downstream configuration, linking to example configs, and updating docs for HTTP/3 alpha.

Risk Level: n/a
Testing: n/a
Docs Changes: yes
Release Notes: inline
#14829 #2557 #15845
Fixes #12923

Co-Authored-By: Michael Payne michael@sooper.org
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Contributor Author

Given QUIC alpha, I'm going to close this off as done - folks can follow along the mvp list
quic-mvp Required for QUIC MVP

alyssawilk pushed a commit that referenced this issue Jun 10, 2021
Commit Message: make quic proof source and crypto streams extensions. Add config for default ones. If not specified in config, the default ones will be used.

Risk Level: low
Testing: existing tests passed
Part of #2557
Co-authored-by: Dan Zhang <danzh@google.com>
leyao-daily pushed a commit to leyao-daily/envoy that referenced this issue Sep 30, 2021
…y#16658)

Commit Message: make quic proof source and crypto streams extensions. Add config for default ones. If not specified in config, the default ones will be used.

Risk Level: low
Testing: existing tests passed
Part of envoyproxy#2557
Co-authored-by: Dan Zhang <danzh@google.com>
jpsim pushed a commit that referenced this issue Nov 28, 2022
Automated changes by [create-pull-request](https://github.com/peter-evans/create-pull-request) GitHub action

Signed-off-by: JP Simard <jp@jpsim.com>
jpsim pushed a commit that referenced this issue Nov 29, 2022
Automated changes by [create-pull-request](https://github.com/peter-evans/create-pull-request) GitHub action

Signed-off-by: JP Simard <jp@jpsim.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/quic enhancement Feature requests. Not bugs or questions. no stalebot Disables stalebot from closing an issue
Projects
None yet
Development

No branches or pull requests