Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upstream adding QUIC-to-TCP failover #15894

Merged
merged 7 commits into from
Apr 16, 2021
Merged

Conversation

alyssawilk
Copy link
Contributor

@alyssawilk alyssawilk commented Apr 8, 2021

Hey Matt, can I get your thoughts on 2 things?

  1. if we're Ok with having a hard-coded something (right now "use_quic") matcher to handle TCP-TLS and QUIC-TLS in the same cluster (or maybe we want to do something to combine the two configs and auto-select based on transport?)

  2. how we want to handle the connection pool being able to select a non-default transportSocket. I have "use_quic" as a placeholder argument but I think if we extend the API the boolean should probably be replaced with an optional metadata. I'd like your thoughts if you have a better solution before I go change the API and all the mocks though.

Because I haven't cleaned up all the mocks so this will not build, but the integration test both provides a proof of concept for failover working, and the annoyance in configuring both TCP-TLS and QUIC-TLS transport sockets.

cc @RyanTheOptimist

Commit Message:
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Part of #14829

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
envoyproxy/api-shepherds assignee is @mattklein123
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #15894 was opened by alyssawilk.

see: more, trace.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting stuff. A left some ideas for discussion. Thank you!

/wait-any

test/common/upstream/upstream_impl_test.cc Outdated Show resolved Hide resolved
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk alyssawilk marked this pull request as ready for review April 13, 2021 15:43
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Contributor Author

Tsan failure was CI failure, not test fail. This should be ready for a pass!

@alyssawilk
Copy link
Contributor Author

@snowp would you have time today to do a first pass? Matt's on call for the release and I was hoping to land this before I take off for a week.

@alyssawilk
Copy link
Contributor Author

ugggh, accidentally pushed to the wrong branch. reverted but CI is of course running again (was green, and barring flakes will be green again)

@alyssawilk
Copy link
Contributor Author

CI is unrelated flake:

2021-04-15T17:51:08.2680556Z Pulling zipkin (openzipkin/zipkin:)...
2021-04-15T17:51:13.3215508Z received unexpected HTTP status: 503 Service Unavailable
2021-04-15T17:51:13.3776159Z ERROR: starting zipkin .
2021-04-15T17:51:13.3797151Z
2021-04-15T17:51:13.3797958Z > [zipkin] Cleanup (.)
2021-04-15T17:51:13.9288406Z Removing network zipkin-tracing_envoymesh
2021-04-15T17:51:14.2006879Z TESTS FAILED:

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@antoniovicente antoniovicente self-assigned this Apr 15, 2021
Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. There may be a threading issue in the call to maybeCreateFallbackFactory if the factories are not per-worker.

Network::TransportSocketPtr
createTransportSocket(Network::TransportSocketOptionsSharedPtr options) const override {
// TODO(#15649) make createTransportSocket non-const to avoid the const cast.
const_cast<QuicClientTransportSocketFactory*>(this)->maybeCreateFallbackFactory();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are factories per worker or can createTransportSocket be called concurrently by multiple theads? If the later, there's a thread race in this code.

I would consider creating this fallback factory in QuicClientTransportSocketFactory's constructor in order to guarantee that it always exists if createTransportSocket is ever called.

Envoy::Http::ConnectivityGrid::ConnectivityOptions coptions{protocols};
return std::make_unique<Http::ConnectivityGrid>(
dispatcher, api_.randomGenerator(), host, priority, options, transport_socket_options,
state, source, std::chrono::milliseconds(300), coptions);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, are there plans to make next_attempt_duration configurable in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I figured I'd make @RyanTheOptimist take that on both as a simple API change, and to reduce bikeshedding on this PR

ASSERT_TRUE(cluster1->info()->idleTimeout().has_value());
EXPECT_EQ(std::chrono::hours(1), cluster1->info()->idleTimeout().value());

const std::string explicit_http3 = R"EOF(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: changing "explicit" to "alpn" or "auto" would be slightly clearer

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Contributor Author

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the helpful review!
I think default-creating the fallback isn't enough for secrets update, but I'm going to leave that as a TODO and fix when I get back. Assuming I manage to get away =P

Envoy::Http::ConnectivityGrid::ConnectivityOptions coptions{protocols};
return std::make_unique<Http::ConnectivityGrid>(
dispatcher, api_.randomGenerator(), host, priority, options, transport_socket_options,
state, source, std::chrono::milliseconds(300), coptions);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I figured I'd make @RyanTheOptimist take that on both as a simple API change, and to reduce bikeshedding on this PR

Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change and enjoy your vacation next week!

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm api

@alyssawilk alyssawilk merged commit ad4919d into envoyproxy:main Apr 16, 2021
stevenzzzz pushed a commit to stevenzzzz/envoy that referenced this pull request Apr 16, 2021
Risk Level: n/a (hidden by default)
Testing: e2e tests
Docs Changes: n/a
Release Notes: n/a
Part of envoyproxy#14829

Signed-off-by: Xin Zhuang <stevenzzz@google.com>
douglas-reid pushed a commit to douglas-reid/envoy that referenced this pull request Apr 19, 2021
Risk Level: n/a (hidden by default)
Testing: e2e tests
Docs Changes: n/a
Release Notes: n/a
Part of envoyproxy#14829
Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>
gokulnair pushed a commit to gokulnair/envoy that referenced this pull request May 6, 2021
Risk Level: n/a (hidden by default)
Testing: e2e tests
Docs Changes: n/a
Release Notes: n/a
Part of envoyproxy#14829

Signed-off-by: Gokul Nair <gnair@twitter.com>
@alyssawilk alyssawilk deleted the h3_api branch February 28, 2022 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants