Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTPRoute: allow more match clauses #3205

Merged
merged 7 commits into from
Jul 29, 2024

Conversation

howardjohn
Copy link
Contributor

Today, a route is limited to 16 routes with 8 matches each. This is problematic in real world environments.

While its easy to split a route up (you can easily get the same behavior, and similar cognitive complexity by having 1 route per HTTPRoute), its quite hard to split up matches. For instance, https://github.com/istio/istio/blob/d7a9700d5eaa4e9728274b408623670f48deadb5/samples/bookinfo/gateway-api/bookinfo-gateway.yaml#L23-L38 is an example of a route matching the exposed APIs for a frontend. This is only a small tivial example, and it already uses 5. In our user base, we often see far beyond 8.

Splitting these up is both complex for the user, and may actually lead to different behaviors if implementations treat route groups differently (for instance, if we add retry budget -- that could now be split; other core or implementation specific policies may behave similarly, this is just an example).

The limit on 8 here seems quite small given the context of real usage, the other limits in the API (16 routes, 64 listeners), and the cost of manually working around it. Based on this, I propose we raise the limit to match Gateway listeners.

What type of PR is this?

/kind feature

Does this PR introduce a user-facing change?:

Increased the limit on HTTPRoute matches from 8 to 64.

Today, a route is limited to 16 routes with 8 matches each. This is
problematic in real world environments.

While its easy to split a route up (you can easily get the same
behavior, and similar cognitive complexity by having 1 route per
HTTPRoute), its quite hard to split up matches. For instance,
https://github.com/istio/istio/blob/d7a9700d5eaa4e9728274b408623670f48deadb5/samples/bookinfo/gateway-api/bookinfo-gateway.yaml#L23-L38
is an example of a route matching the exposed APIs for a frontend. This
is only a small tivial example, and it already uses 5. In our user
base, we often see far beyond 8.

Splitting these up is both complex for the user, and may actually lead
to different behaviors if implementations treat route groups differently
(for instance, if we add retry budget -- that could now be split; other
core or implementation specific policies may behave similarly, this is
just an example).

The limit on 8 here seems quite small given the context of real usage,
the other limits in the API (16 routes, 64 listeners), and the cost of
manually working around it. Based on this, I propose we raise the limit
to match Gateway listeners.
@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 19, 2024
@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jul 19, 2024
@howardjohn
Copy link
Contributor Author

/retest

@robscott
Copy link
Member

@howardjohn Historically we've been against this because of the worst case scenarios this could lead to if you max out the length of every list within a list (16 rules * 8 matchers * 16 filters == 2048 possible filters in a Route). Increasing the matchers from 8 to 64 would turn that into 16384, which would obviously be problematic.

With that said, we've got CEL now, so we could likely make a case for the following:

  1. Allow up to 64 matchers in a Rule (that seems rather extreme, but something >8 does seem sensible)
  2. Write CEL validation that ensures that the maximum number of matchers allowed in a Route does not go up from it's current theoretical max (128)

That would mean we're allowing significantly more flexibility here without actually increasing the max size of an individual route. If this ends up working reasonably well, we could apply the same principle elsewhere in the API.

@howardjohn
Copy link
Contributor Author

For my understanding, what specifically are we trying to constrain with the overall size limit? Is it...

  • Not requiring implementations to support large routes
  • Staying below CEL limits
  • Making sure we fail hard before etcd size limits
  • Something else?

If its the CEL limits, is the concern that its within the complexity limits now (which already take into account the 16*64), or that it could in the future if matches get more complex?

I ask since it may impact the solution here

@robscott
Copy link
Member

For my understanding, what specifically are we trying to constrain with the overall size limit? Is it...

  • Not requiring implementations to support large routes
  • Staying below CEL limits
  • Making sure we fail hard before etcd size limits

All of the above. I'd also mention that it's possible that we'll add more types of matchers inside this list which will only increase the overall size and complexity, I don't want to get so close to a max size that we end up having no additional room to grow. I'm also strongly biased towards more smaller routes than a few very large routes. In general though, when we're talking about API compatibility, this is already a GA API, and significantly loosening the validation would be problematic. I think we can make a reasonably strong case in favor of a change that allows for more flexibility but still fits within the original constraints though.

@arkodg
Copy link
Contributor

arkodg commented Jul 19, 2024

+1 for this change, this will reduce the friction users are facing when migrating from exiting APIs to Gateway API, who are used to authoring rules a specific way, and this is currently slowing down Gateway API adoption.

@robscott's suggestion of adding a CEL validation to measure and limit total filters for a Route, is a great idea to account for the concerns of the size of the final object persisting in the API server

cross linking previous issues raised by users in the past

@howardjohn
Copy link
Contributor Author

I've pushed up a change to limit the aggregate size. LMK what you think

Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @howardjohn!

apis/v1/httproute_types.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 19, 2024
@robscott
Copy link
Member

Sadly looks like we're running into kubernetes/kubernetes#120973 - CRD is invalid because rule is too complex

@howardjohn
Copy link
Contributor Author

That's only from the extra restriction fwiw. It works fine when we just allow it to be unbounded, even with the theoretical 16k entries.

@howardjohn
Copy link
Contributor Author

Honestly don't get how that rule can possibly be 100x over the limit..

Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @howardjohn! Will defer to someone else for LGTM.

/approve

@@ -119,6 +119,7 @@ type HTTPRouteSpec struct {
// +optional
// +kubebuilder:validation:MaxItems=16
// +kubebuilder:default={{matches: {{path: {type: "PathPrefix", value: "/"}}}}}
// +kubebuilder:validation:XValidation:message="While 16 rules and 64 matches are allowed, the total matches must be less than 128",rule="(self.size() > 0 ? self[0].matches.size() : 0) + (self.size() > 1 ? self[1].matches.size() : 0) + (self.size() > 2 ? self[2].matches.size() : 0) + (self.size() > 3 ? self[3].matches.size() : 0) + (self.size() > 4 ? self[4].matches.size() : 0) + (self.size() > 5 ? self[5].matches.size() : 0) + (self.size() > 6 ? self[6].matches.size() : 0) + (self.size() > 7 ? self[7].matches.size() : 0) + (self.size() > 8 ? self[8].matches.size() : 0) + (self.size() > 9 ? self[9].matches.size() : 0) + (self.size() > 10 ? self[10].matches.size() : 0) + (self.size() > 11 ? self[11].matches.size() : 0) + (self.size() > 12 ? self[12].matches.size() : 0) + (self.size() > 13 ? self[13].matches.size() : 0) + (self.size() > 14 ? self[14].matches.size() : 0) + (self.size() > 15 ? self[15].matches.size() : 0) <= 128"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely gross, but the least bad we can offer until CEL cost estimation allows us to use map here (Kubernetes 1.30+). Here's the CEL playground link for future readers.

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 22, 2024
@robscott
Copy link
Member

Actually, would really like @youngnick specifically to sign off on this, will add a hold until he's able to take a look.

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 22, 2024
@gauravkghildiyal
Copy link
Member

LGTM

Deferring review to Nick as per #3205 (comment)

@youngnick
Copy link
Contributor

As long as the overall complexity isn't increased, this makes sense to me. Nice use of CEL to make sure that's the case.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 23, 2024
Copy link
Member

@sunjayBhatia sunjayBhatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we do the same for GRPCRoute as well while we're making this change?

@robscott
Copy link
Member

should we do the same for GRPCRoute as well while we're making this change?

I think updating GRPCRoute to match would be great. Would approve in this PR or a follow up.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 29, 2024
@howardjohn
Copy link
Contributor Author

Added gRPC route as well

@robscott
Copy link
Member

Thanks @howardjohn!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 29, 2024
@robscott
Copy link
Member

I think we've got enough other LGTMs on this one to remove the hold.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 29, 2024
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 29, 2024
Copy link
Member

@sunjayBhatia sunjayBhatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 29, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: howardjohn, robscott, sunjayBhatia

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 397f8c9 into kubernetes-sigs:main Jul 29, 2024
8 checks passed
xtineskim pushed a commit to xtineskim/gateway-api that referenced this pull request Aug 8, 2024
* HTTPRoute: allow more match clauses

Today, a route is limited to 16 routes with 8 matches each. This is
problematic in real world environments.

While its easy to split a route up (you can easily get the same
behavior, and similar cognitive complexity by having 1 route per
HTTPRoute), its quite hard to split up matches. For instance,
https://github.com/istio/istio/blob/d7a9700d5eaa4e9728274b408623670f48deadb5/samples/bookinfo/gateway-api/bookinfo-gateway.yaml#L23-L38
is an example of a route matching the exposed APIs for a frontend. This
is only a small tivial example, and it already uses 5. In our user
base, we often see far beyond 8.

Splitting these up is both complex for the user, and may actually lead
to different behaviors if implementations treat route groups differently
(for instance, if we add retry budget -- that could now be split; other
core or implementation specific policies may behave similarly, this is
just an example).

The limit on 8 here seems quite small given the context of real usage,
the other limits in the API (16 routes, 64 listeners), and the cost of
manually working around it. Based on this, I propose we raise the limit
to match Gateway listeners.

* Limit aggregate size

* Drop to 128 and add tests

* hacky

* Unroll the loop

* gRPCRoute and update comment

* Fix grpcroute
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants