Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for applied objects to become healthy using CEL readiness rule or kstatus #2419

Merged
merged 2 commits into from
Aug 13, 2024

Conversation

barney-s
Copy link
Collaborator

@barney-s barney-s commented Aug 5, 2024

Change description

  • apply + wait for ready
    • Change Applier to get status using CEL rules if rule present
    • fallback Logic to get status of applied objects using kstatus
    • Applier.AreResourcesReady() wraps readiness checks
    • Each stage is applied and then checks if resources are ready (not blocking reconciler)
    • If resources are not ready, reconciler returns with an error causing
      a queued reconcile.
  • CEL based Readiness Rule support
    • Change Composition spec to take a list of CEL rules for custom status
      rules
    • pkg/cel/engine.go - Simple CEL wrapper to run rules against
  • Testcases:
    • TestFirstCompositionTeamPage: applies teampage sample (will tests waiting for
      deployment becoming healthy)
    • TestCustomStatusCELRule: tests using a CEL readiness rule for
      Deployment

Sample outputs

CEL rule based checks

CEL rule in TestCustomStatusCELRule/input.yaml: readyIf: "status.readyReplicas==status.replicas"

plan status showing CEL rule result:

status:
  compositionGeneration: 1
  compositionUID: 9e12a45d-8a38-4e4a-bc31-488934bdd211
  conditions:
  - lastTransitionTime: "2024-08-07T21:14:55Z"
    message: 'Evaluated and Applied stages: server'
    reason: ProcessedAllStages
    status: "True"
    type: Ready
  generation: 2
  inputGeneration: 1
  stages:
    server:
      appliedCount: 3
      lastApplied:
      - group: apps
        health: Healthy
        kind: Deployment
        name: team-landing
        namespace: my-team
        status: Readiness Rule evaluated to true
        version: v1
      - health: Healthy
        kind: Service
        name: team-landing-landing
        namespace: my-team
        status: Service is ready
        version: v1
      - health: Healthy
        kind: ConfigMap
        name: team-landing-page
        namespace: my-team
        status: Resource is always ready
        version: v1
      resourceCount: 3

kstatus based checks

plan status showing waiting for resources to become healthy:

status:
    compositionGeneration: 0
    conditions:
    - lastTransitionTime: "2024-08-06T17:06:53Z"
      message: 'Evaluated stages: server'
      reason: EvaluationPending
      status: "False"
      type: Ready
    - lastTransitionTime: "2024-08-06T17:06:53Z"
      message: 'Expander: server, Message: Not all resources are healthy'
      reason: WaitingForAppliedResources
      status: "True"
      type: Waiting
    inputGeneration: 0
    stages:
      server:
        appliedCount: 3
        lastApplied:
        - group: apps
          health: Unhealthy
          kind: Deployment
          name: team-team-rdhbvc8n
          namespace: team-rdhbvc8n
          status: 'Available: 0/1'
          version: v1
         ....
        resourceCount: 3

plan status with all resources healthy:

status:
    compositionGeneration: 1
    compositionUID: 10785f33-0a4a-41a1-a63a-dfaf9d4aee89
    conditions:
    - lastTransitionTime: "2024-08-06T17:23:20Z"
      message: 'Evaluated and Applied stages: server'
      reason: ProcessedAllStages
      status: "True"
      type: Ready
    generation: 2
    inputGeneration: 1
    stages:
      server:
        appliedCount: 3
        lastApplied:
        - group: apps
          health: Healthy
          kind: Deployment
          name: team-team-h26bql6s
          namespace: team-h26bql6s
          status: 'Deployment is available. Replicas: 1'
          version: v1
        - health: Healthy
          kind: Service
          name: team-team-h26bql6s-landing
          namespace: team-h26bql6s
          status: Service is ready
          version: v1
        - health: Healthy
          kind: ConfigMap
          name: team-team-h26bql6s-page
          namespace: team-h26bql6s
          status: Resource is always ready
          version: v1

@barney-s barney-s force-pushed the cel-readiness branch 7 times, most recently from 390569d to 0d2ecf9 Compare August 6, 2024 16:52
@barney-s barney-s changed the title WIP: using kstatus and cel-rules for readiness checks WIP: Wait for applied objects to become healthy Aug 6, 2024
@barney-s barney-s changed the title WIP: Wait for applied objects to become healthy Wait for applied objects to become healthy Aug 6, 2024
@barney-s
Copy link
Collaborator Author

barney-s commented Aug 6, 2024

/assign @cheftako
/assign @justinsb
/assign @xiaoweim

- apply + wait for ready
  - Change Applier to get status using CEL rules if rule present
  - fallback Logic to get status of applied objects using kstatus
  - Applier.AreResourcesReady() wraps readiness checks
  - Each stage is applied and then checks if resources are ready (not blocking reconciler)
  - If resources are not ready, reconciler returns with an error causing
    a queued reconcile.
- CEL based Readiness Rule support
  - Change Composition spec to take a list of CEL rules for custom status
  rules
  - pkg/cel/engine.go - Simple CEL wrapper to run rules against
- Testcases:
  - TestFirstCompositionTeamPage:  applies teampage sample (will tests waiting for
  deployment becoming healthy)
  - TestCustomStatusCELRule: tests using a CEL readiness rule for
    Deployment
@barney-s barney-s changed the title Wait for applied objects to become healthy Wait for applied objects to become healthy using CEL readiness rule or kstatus Aug 7, 2024
@@ -16,6 +16,7 @@ apiVersion: composition.google.com/v1alpha1
kind: Composition
metadata:
name: team-page
namespace: default
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we should start with the CC structure in which case this would be "config-control".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do. But this should work fine in any namespace. Compositions can exist in any namespace that the admin chooses it to be. The Facades can be in any namespace as well.

} else {
// Fall back to kstatus
result, err := status.Compute(u)
if err == nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe handle err != nil more clearly here? But I guess we're already handling it acceptably


func NewEngine(u *unstructured.Unstructured) (*Engine, error) {
// TODO: what withtagname ? can we pass yaml ?
registry := easycel.NewRegistry("cel-engine", easycel.WithTagName("json"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should ask the CEL folk what CEL library they recommend

@justinsb
Copy link
Collaborator

justinsb commented Aug 7, 2024

Generally LGTM. A few comments, the only real blocker I think is whether we need to refetch the object when checking status. I don't think we do; we should have it from the apply.

@barney-s
Copy link
Collaborator Author

barney-s commented Aug 13, 2024

Generally LGTM. A few comments, the only real blocker I think is whether we need to refetch the object when checking status. I don't think we do; we should have it from the apply.

With the changes in KDP to allow callbacks for computing health, we no longer need to fetch the object. Updated the code (second commit) to register a callback and not fetch anymore.

9197897

@google-oss-prow google-oss-prow bot added the lgtm label Aug 13, 2024
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justinsb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit f5d86cc into GoogleCloudPlatform:master Aug 13, 2024
8 checks passed
@barney-s barney-s deleted the cel-readiness branch August 29, 2024 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants