Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResolvedRefs condition for GatewayClass parametersRef #2911

Closed
sjberman opened this issue Mar 29, 2024 · 11 comments
Closed

ResolvedRefs condition for GatewayClass parametersRef #2911

sjberman opened this issue Mar 29, 2024 · 11 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@sjberman
Copy link
Contributor

sjberman commented Mar 29, 2024

A discussion was had in Slack about adding a ResolvedRefs condition to a GatewayClass status to reflect the state of a parametersRef object that is attached. This led to a couple different concerns.

  1. How would this coexist with the current InvalidParameters reason that exists for the Accepted condition on a GatewayClass?
    Potential flows and their status outcomes:

    • Create GatewayClass with bad parametersRef target: GatewayClassStatus[{ type: Accepted, status: False?, reason: InvalidParameters? }, { type: ResolvedRefs, status: False, reason: ResolvedRefs? }]
    • Create GatewayClass with valid parametersRef target, but invalid values: GatewayClassStatus[{ type: Accepted, status: False?, reason: InvalidParameters? }, { type: ResolvedRefs, status: True, reason: ResolvedRefs }]
    • Create GatewayClass with valid parametersRef target and values -> delete parametersRef resource: GatewayClassStatus[{ type: Accepted, status: True?, reason: InvalidParameters? }, { type: ResolvedRefs, status: True, reason: ResolvedRefs }]
      a. Gateway created prior to this change expected to be okay unless dynamic changes are allowed, if so what happens?
      b. Gateway created referencing this GatewayClass after this change should set GatewayStatus[{ type: Accepted, status: False?, reason: ? }]
      c. Does GatewayClass move to Accepted: False?
    • Create GatewayClass with valid parametersRef target and values -> change parametersRef resource content so one field becomes invalid: GatewayClassStatus[{ type: Accepted, status: True?, reason: InvalidParameters? }, { type: ResolvedRefs, status: True, reason: ResolvedRefs }]
      a. Gateway created prior to this change expected to be okay unless dynamic changes are allowed, if so what happens?
      b. Gateway created referencing this GatewayClass after this change should set GatewayStatus[{ type: Accepted, status: False?, reason: ? }]
      c. Does GatewayClass move to Accepted: False?
  2. Another concern that was raised is what the expectations are if dynamic changes are allowed to a parametersRef (by either adding/removing the reference, deleting the resource, or updating the fields within it). Specifically, if a GatewayClass goes from being Accepted to not Accepted due to an invalid update to the parametersRef, how does that affect the entire config tree below it? By following a true declarative system, all Gateways under this class should no longer be Accepted, but this would nullify the entire configuration and be very disruptive. The other option is what was mentioned in the examples above, where previous Gateways live on, while new ones are not Accepted. Issue here, however, is that if the control plane restarts, it loses the previous state of the valid parametersRef, so the original Gateways that were left alone will no longer receive those valid params, and the config is void. Note that this is technically an issue today with the existing InvalidParameters reason on the Accepted condition, so nothing new with the proposed ResolvedRefs condition.

  3. Finally, what does conformance look like? While not stated in the spec, the existing ResolvedRefs condition for HTTPRoutes is required in the conformance tests. Would that make this condition a requirement for a GatewayClass?

@sjberman
Copy link
Contributor Author

In relation to point 2, one proposal our implementation has come up with is to set Accepted: true, reason: InvalidParameters, message: GatewayClass is Accepted, but params are invalid for X reason and will be ignored. This means we ignore the params and allow the existing configuration to continue on, while giving a clear message to users as to what is happening.

The risk here is that if paramsRef goes from valid to invalid, any settings that were in there would be reverted to defaults, leading to a change in behavior or leading to downtime. The impact of this is obviously dependent on the implementation and the settings that exist in that paramsRef. Could be a better alternative than tearing down the entire config tree due to a non-Accepted GatewayClass, though.

@shaneutt shaneutt added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 2, 2024
@mikemorris
Copy link
Contributor

mikemorris commented Apr 2, 2024

I believe the current fuzziness in the spec around "snapshot/templating" behavior when creating Gateways referencing a configured GatewayClass combined with the difficulty handling of "dynamic" changes to parameters (by modifying either the reference or the referenced object) helps create the justification for adding a parametersRef field under infrastructure directly on Gateways, and trying to clean up this behavior to be better-defined.

Adding a parametersRef field as part of the infrastructure stanza was originally proposed and discussed extensively in #1868 and #1757 but was ultimately not included in the implementation in #2440 - I can't find the precise comment where it was blocked/dropped though (I think it happened after the GEPs merged, during API review).

/cc @howardjohn @robscott

@sjberman
Copy link
Contributor Author

sjberman commented Apr 2, 2024

Doesn't that just move the blast radius down one level? Unless we are considering the impact of an invalid policy attachment on a Gateway to be implementation-specific, versus stating in the spec that the Accepted condition should be false.

@mikemorris
Copy link
Contributor

mikemorris commented Apr 2, 2024

Doesn't that just move the blast radius down one level?

At a minimum, yes, and that alone feels like a good improvement.

I'd expect that it could be more reasonable for a Gateway to stay { type: Accepted, status: True } and just be missing some configured parameters (expressed through its existing { type: ResolvedRefs, status: False } or adding { type: PartiallyInvalid, status: True, reason: UnsupportedValue } similar to #2429 for HTTPRoute) than to figure out how to deal with a dynamic class/template becoming invalid in any more standardized way that doesn't have a huge blast radius.

@robscott
Copy link
Member

robscott commented Apr 3, 2024

Doesn't that just move the blast radius down one level?

That was also one of my original concerns. The proposal by @howardjohn in #2924 does limit the blast radius by making paramsRefs namespace-local, so the blast radius of a change would also be limited to the local namespace.

Unless we are considering the impact of an invalid policy attachment on a Gateway to be implementation-specific, versus stating in the spec that the Accepted condition should be false.

Maybe we should take these options and apply them more broadly?

// When this happens, implementations MUST take one of the following
// approaches:
//
// 1) Drop Rule(s): With this approach, implementations will drop the
// invalid Route Rule(s) until they are fully valid again. The message
// for this condition MUST start with the prefix "Dropped Rule" and
// include information about which Rules have been dropped. In this
// state, the "Accepted" condition MUST be set to "True" with the latest
// generation of the resource.
// 2) Fall Back: With this approach, implementations will fall back to the
// last known good state of the entire Route. The message for this
// condition MUST start with the prefix "Fall Back" and include
// information about why the current Rule(s) are invalid. To represent
// this, the "Accepted" condition MUST be set to "True" with the
// generation of the last known good state of the resource.

@sjberman
Copy link
Contributor Author

sjberman commented Apr 3, 2024

@robscott That's similar to the approach that I mentioned in my first comment for the GatewayClass. Basically just keep things in a good state, but fall back on the paramsRef to the default values (our implementation may just have to use this approach since we are intending to add infra configuration at the top level and the GC paramsRef is the way to do that right now). It's almost the lesser of two evils. Your paramsRef config will be reverted to a valid state, which could cause disruption downstream since values are changing, but it's less disruptive than nullifying the entire downstream config.

This obviously is the big issue to figure out, but I also want to make sure we touch on the questions around the existence of a ResolvedRefs condition and the conformance for it.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 2, 2024
@mikemorris
Copy link
Contributor

mikemorris commented Jul 11, 2024

With #2924 merging, there's a path for configuring per-Gateway custom parameters now.

I'm not entirely sure if the existing InvalidParameters GatewayConditionReason for the Accepted GatewayConditionType is sufficient or if we should consider adding ResolvedRefs to Gateway conditions in addition to Listener conditions, but I think it's more clear now that we shouldn't be encouraging this pattern on the GatewayClass resource.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 10, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 9, 2024
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

6 participants