Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supports gRPC as a protocol (without TLS) #987

Closed
wants to merge 17 commits into from

Conversation

dprotaso
Copy link
Member

@dprotaso dprotaso commented May 30, 2018

Fixes #707, #813

Proposed Changes

  • Supports gRPC as a protocol (without TLS)

TODO

  • e2e test
  • activator support
  • switching to using an h2(c) server (non TLS) in ela-queue & activator

Outstanding Questions

  • what metrics is the autoscaler interested in when the protocol is grpc & the exchange is bidirectional
  • istio + grpc + mtls

@google-prow-robot google-prow-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 30, 2018
@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of the commit author(s) and merge this pull request when appropriate.

@google-prow-robot google-prow-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 30, 2018
Copy link
Member

@mattmoor mattmoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a cursory look, my main thoughts are in docs/spec/spec.md.


1. [Install Elafros](https://github.com/elafros/install/blob/master/README.md)
1. Install [docker](https://www.docker.com/)
1. Install [ko](https://github.com/google/go-containerregistry/tree/master/cmd/ko)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr remove either Docker or ko

My biases in sample/ lately:

  • Document it with Dockerfile in README.md (which is probably most familiar to users)
  • Make it work with Bazel / ko for contributors

Does that make sense?

@@ -256,6 +258,10 @@ spec:
# scaling to/from 0.
servingState: Active | Reserve | Retired

# The protocol used for this Revision. This will define the Istio ingress spec
# and service spec required for this Revision.
protocol: grpc | http
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr It feels to me like this belongs on Route not Revision.

cc @vaikas-google @evankanderson @cooperneil

I have a hard time imagining a use-case where a Route splits traffic over multiple Revisions with different protocols, which is reflected in some of the getProtocolFromTrafficTargets logic and TODOs that need to cope with the potential heterogeneity here. Let's just not allow it.

I believe the main reason this is needed on Revision in its current form is because you make use of it in ela-queue to proxy arbitrary TCP traffic. Two questions:

  1. Can we switch completely to tcpproxy thereby eliminating the need for the config @ Revision level?
  2. How does this codepath expose auto-scaling metrics (the core purpose of ela-queue today)?

I suspect we also need to change the activator in a similar manner. cc @josephburnett

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dprotaso and I struggled with where the protocol belongs ourselves and decided to specify the protocol on the Revision and open it up for discussion.

Both the route controller and revision controller need to be aware of the protocol due to the following:

  • The route controller needs to create the appropriate Istio ingress and route service with the protocol specified
  • The revision controller also needs to create the revision service with the servicePort name specified as the protocol

Istio docs that explains it here

We are currently in the process of removing the need for the protocol in the ela-queue container by utilizing a h2cserver that can proxy both h2c and http traffic based on the request protocol. The reason why we couldn't use a tcp proxy was because we wouldn't be able to collect request statistics for autoscaling.

We're also in the process of changing the activator to utilize a h2c server as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI the Envoy in the istio sidecar collects request statistics similar to the ones ela-queue collects. Using them hasn't been a high priority yet, but if the queue-proxy is a blocker that could change. It's unclear whether they can fully replace the queue proxy stats collection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the Istio docs on the port naming restrictions for ingress seem to have been removed between 0.7 and 0.8. I'm not sure why earlier versions of Istio had this restriction (digging, it appears to have been copied from the pod injection docs).

@dprotaso
Copy link
Member Author

/test pull-elafros-elafros-integration-tests

@google-prow-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dprotaso
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: evankanderson

Assign the PR to them by writing /assign @evankanderson in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-metrics-robot
Copy link

The following is the coverage report on pkg/. Say /test pull-knative-serving-go-coverage to run the coverage report again

File Old Coverage New Coverage Delta
pkg/controller/route/route.go 77.9% 78.1% 0.2%
pkg/controller/route/route_test.go 77.9% 78.1% 0.2%
pkg/webhook/configuration.go 95.2% 92.1% -3.0%

*TestCoverage feature is being tested, do not rely on any info here yet

@bsnchan bsnchan force-pushed the grpc branch 2 times, most recently from 8fb9a4a to 4c039c0 Compare June 5, 2018 14:46
@knative-metrics-robot
Copy link

The following is the coverage report on pkg/. Say /test pull-knative-serving-go-coverage to run the coverage report again

File Old Coverage New Coverage Delta
pkg/controller/route/route.go 77.9% 78.1% 0.2%
pkg/controller/route/route_test.go 77.9% 78.1% 0.2%
pkg/webhook/configuration.go 95.2% 92.1% -3.0%

*TestCoverage feature is being tested, do not rely on any info here yet

@knative-metrics-robot
Copy link

The following is the coverage report on pkg/. Say /test pull-knative-serving-go-coverage to run the coverage report again

File Old Coverage New Coverage Delta
pkg/controller/route/route.go 77.9% 78.1% 0.2%
pkg/controller/route/route_test.go 77.9% 78.1% 0.2%
pkg/webhook/configuration.go 95.2% 92.1% -3.0%

*TestCoverage feature is being tested, do not rely on any info here yet

@knative-metrics-robot
Copy link

The following is the coverage report on pkg/. Say /test pull-knative-serving-go-coverage to run the coverage report again

File Old Coverage New Coverage Delta
pkg/controller/route/route.go 77.9% 78.1% 0.2%
pkg/controller/route/route_test.go 77.9% 78.1% 0.2%
pkg/webhook/configuration.go 95.2% 92.1% -3.0%

*TestCoverage feature is being tested, do not rely on any info here yet

@bsnchan
Copy link
Contributor

bsnchan commented Jun 5, 2018

/test pull-knative-serving-integration-tests

1 similar comment
@bsnchan
Copy link
Contributor

bsnchan commented Jun 5, 2018

/test pull-knative-serving-integration-tests

@knative-metrics-robot
Copy link

The following is the coverage report on pkg/. Say /test pull-knative-serving-go-coverage to run the coverage report again

File Old Coverage New Coverage Delta
pkg/controller/route/route.go 77.9% 78.1% 0.2%
pkg/controller/route/route_test.go 77.9% 78.1% 0.2%
pkg/webhook/configuration.go 95.2% 92.1% -3.0%

*TestCoverage feature is being tested, do not rely on any info here yet

@bsnchan
Copy link
Contributor

bsnchan commented Jun 5, 2018

/test pull-knative-serving-unit-tests

@bsnchan
Copy link
Contributor

bsnchan commented Jun 5, 2018

/test pull-knative-serving-integration-tests

2 similar comments
@bsnchan
Copy link
Contributor

bsnchan commented Jun 5, 2018

/test pull-knative-serving-integration-tests

@bsnchan
Copy link
Contributor

bsnchan commented Jun 5, 2018

/test pull-knative-serving-integration-tests

@knative-metrics-robot
Copy link

The following is the coverage report on pkg/. Say /test pull-knative-serving-go-coverage to run the coverage report again

File Old Coverage New Coverage Delta
pkg/controller/route/route.go 77.9% 78.1% 0.2%
pkg/controller/route/route_test.go 77.9% 78.1% 0.2%
pkg/webhook/configuration.go 95.2% 92.1% -3.0%

*TestCoverage feature is being tested, do not rely on any info here yet

bsnchan and others added 16 commits June 18, 2018 13:49
* The h2c server can proxy both h2c and http requests based on the
request's protocol

Co-authored-by: Dave Protasowski <dprotaso@gmail.com>
* We needed to create an activator service per protocol
  Since services cannot share a port with different protocols

Co-authored-by: Dave Protasowski <dprotaso@gmail.com>
Co-authored-by: Dave Protasowski <dprotaso@gmail.com>
* When the Ingress is updated for gRPC, the service takes some time to
mark the upstream as healthy
* it was only being used in the sample grpc app so we now pull in the
depency via the sample app's dockerfile

* also removed unused variables from cmd/queue/main.go
ignore the folders

* google.golang.org/grpc is not vendored in since it is only used in the sample
apps so don't attempt to build or test the sample grpc apps
* This is allows go build ./sample/... to run successfully since the grpc
dependencies are not vendored in
* removes golang/protobuf from Gopkg.toml
* alias h2c package as h2cutil
* adds a comment for NewTransport
* removes refactoring changes (will submit them as a separate pr)
* updates sample app's readme to use buildtag when running client code
@knative-metrics-robot
Copy link

The following is the coverage report on pkg/. Say /test pull-knative-serving-go-coverage to run the coverage report again

File Old Coverage New Coverage Delta
pkg/controller/route/route.go 78.7% 79.0% 0.3
pkg/controller/route/route_test.go 78.7% 79.0% 0.3
pkg/webhook/configuration.go 95.2% 92.1% -3.0

*TestCoverage feature is being tested, do not rely on any info here yet

@bsnchan
Copy link
Contributor

bsnchan commented Jun 18, 2018

Hey @evankanderson - is anything blocking this from getting merged? Having to rebase constantly is a bit of a pain. It would be nice to mark this as hold if we're waiting for another PR to go in first.

@evankanderson
Copy link
Member

Good point -- I have two concerns with this CL (one of which you're clearly having to wrestle with):

  1. It's pretty big, and will have a lot of conflicts.
  2. It's introducing a "fork" into the data plane which may be difficult to take out later, and which shouldn't be needed after Replace Ingress and RouteRule with Gateway and VirtualService respectively #1047 is fixed (presumably by Migrate to Istio v1alpha3 Gateway / VirtualService #1228)

Maybe we can break this into 2-4 items?

  • queue and activator changes seem non-controversial for item 2, and will be a long-term benefit in making our request path HTTP/2 clean.
  • API and route and configuration controller changes are where my concerns are -- introducing forking data paths increases the changes that each data path will end up with different capabilities. Then we end up making a chart explaining which paths support which things, and it's always out of date with the code. As much as possible, I'd like to keep a single data path which is signaled on L7 traffic which is compatible with HTTP framing (websocket, http2, grpc all meet this goal).
  • The grpc sample app, which seems like a great way to test that the e2e is working.

@bsnchan
Copy link
Contributor

bsnchan commented Jun 18, 2018

@evankanderson That make sense. I agree with waiting for #1228 to be completed as it would probably simplify a lot of the API/controller changes.

To clarify - I can submit separate PRs for:

  1. queue to support h2c
  2. activator to support h2c
  3. any controller changes required after Migrate to Istio v1alpha3 Gateway / VirtualService #1228 is completed
  4. grpc sample app

Would you accept the first two changes as PR's today? Or would you prefer for the Istio upgrade PR to land first?

@evankanderson
Copy link
Member

I'd be happy to review the first two PRs today.

Ideally, the third PR would not involve any API surface changes.

I'm comfortable committing the 4th item as a simple app even before it's working, as long as we document in the README that it is aspirational on #1047.

@bsnchan
Copy link
Contributor

bsnchan commented Jun 18, 2018

Closing this PR - followed up with #1257, #1260 and #1261.

Update: I can't close this PR since @dprotaso created it :(

@dprotaso
Copy link
Member Author

closing

@dprotaso dprotaso closed this Jun 19, 2018
@tanzeeb tanzeeb mentioned this pull request Dec 5, 2018
skonto pushed a commit to skonto/serving that referenced this pull request Dec 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants