Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Importing/Creating alerting rules in cluster management not working due to notify_when parameter #524

Closed
dejongm opened this issue Jan 3, 2024 · 12 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@dejongm
Copy link

dejongm commented Jan 3, 2024

Describe the bug
I'm importing alert rules and the alert rule, as created in the GUI, does not have notify_when set. The rule is configured with a filter query. notify_when is a required parameter in the Terraform resource. When I set notify_when in my terraform config and redeploy, it breaks the alerting rule in Kibana, as it no longer honors the filter rule and constantly sends me alerts. I also, can no longer modify the rule in Kibana, as it gives me an internal server error message.

The same issue occurs when I initiate rule creation directly from Terraform. It creates the rule successfully, but then any subsequent update tries to repopulate the notify_when parameter. The rule is broken on creation.

To Reproduce
Steps to reproduce the behavior:

  1. TF configuration used '...'
resource "elasticstack_kibana_alerting_rule" "main" {
  name        = var.name
  consumer    = var.consumer
  notify_when = var.notify_when
  params = local.params
  rule_type_id = var.rule_type_id
  interval     = var.interval
  enabled      = var.enabled
  throttle     = var.throttle
  space_id     = var.space_id

  dynamic actions {
    for_each = local.actions != null ? local.actions : []
    content {
        id      = actions.value.id
        params  = actions.value.params
        group   = actions.value.group
    }
  }
}

vars:

name: Disk Usage
consumer: monitoring
notify_when: onThrottleInterval
rule_type_id: monitoring_alert_disk_usage
interval: 1m
params: |-
  threshold: 80
  duration: 5m
  filterQueryText: NOT elasticsearch.node.roles:"data_frozen"
  filterQuery: |-
    {"bool":{"must_not":{"bool":{"should":[{"term":{"elasticsearch.node.roles":{"value":"data_frozen"}}}],"minimum_should_match":1}
actions:
  - group: default
    connector_name: "Monitoring: Write to Kibana log"
    params: |-
      message: "{{context.internalShortMessage}}"
      level: info
  - group: default
    connector_name: Elastic-Cloud-SMTP
    params: |-
      message: "{{context.internalFullMessage}}"
      to:
        - xxx@xxx.xxx
      subject: Elastic - High Disk Usage
  1. TF operations to execute to get the error '...' [e.g terraform plan,terraform apply, terraform destroy]
terraform import module.monitor_alerting_rule[\"default\;disk-usage\"].elasticstack_kibana_alerting_rule.main default/3a3d1950-84af-11ee-bb3d-a97312b6e99c
terraform plan
  1. See the error in the output '...'
    The policy before import:
    {
      "id": "3a3d1950-84af-11ee-bb3d-a97312b6e99c",
      "name": "Disk Usage",
      "tags": [],
      "enabled": true,
      "consumer": "monitoring",
      "throttle": null,
      "revision": 18,
      "running": false,
      "schedule": {
        "interval": "1m"
      },
      "params": {
        "duration": "5m",
        "filterQuery": "",
        "filterQueryText": "",
        "threshold": 80
      },
      "rule_type_id": "monitoring_alert_disk_usage",
      "created_by": "1685651753",
      "updated_by": "1685651753",
      "created_at": "2023-11-16T18:37:43.710Z",
      "updated_at": "2023-12-07T22:36:34.990Z",
      "api_key_owner": "1685651753",
      "notify_when": null,
      "mute_all": false,
      "muted_alert_ids": [],
      "scheduled_task_id": "3a3d1950-84af-11ee-bb3d-a97312b6e99c",
      "execution_status": {
        "status": "active",
        "last_execution_date": "2023-12-07T22:36:41.358Z",
        "last_duration": 736
      },
      "actions": [
        {
          "group": "default",
          "id": "39ae46d0-84af-11ee-bb3d-a97312b6e99c",
          "params": {
            "level": "info",
            "message": "{{context.internalShortMessage}}"
          },
          "connector_type_id": ".server-log",
          "frequency": {
            "summary": false,
            "notify_when": "onActionGroupChange",
            "throttle": null
          },
          "uuid": "697a281e-dd62-478a-8ad8-8ccb9a3b2dea"
        },
        {
          "group": "default",
          "id": "elastic-cloud-email",
          "params": {
            "message": "{{context.internalFullMessage}}",
            "to": [
              "XXXX"
            ],
            "subject": "Elastic - High Disk Usage"
          },
          "connector_type_id": ".email",
          "frequency": {
            "summary": false,
            "notify_when": "onActionGroupChange",
            "throttle": null
          },
          "uuid": "ee56cb2c-7a30-4750-aaa5-de9a17233086"
        }
      ],
      "last_run": {
        "alerts_count": {
          "active": 2,
          "new": 2,
          "recovered": 0,
          "ignored": 0
        },
        "outcome_msg": null,
        "outcome_order": 0,
        "outcome": "succeeded",
        "warning": null
      },
      "next_run": "2023-12-07T22:37:41.334Z",
      "api_key_created_by_user": false
    }

The Terraform Changeset after import:

Terraform will perform the following actions:

  # module.monitor_alerting_rule["default;disk-usage"].elasticstack_kibana_alerting_rule.main will be updated in-place
  ~ resource "elasticstack_kibana_alerting_rule" "main" {
        id                    = "default/3a3d1950-84af-11ee-bb3d-a97312b6e99c"
        name                  = "Disk Usage"
      + notify_when           = "onThrottleInterval"
        tags                  = []
        # (10 unchanged attributes hidden)

        # (2 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

The object after redeploying with Terraform:

  {
      "id": "3a3d1950-84af-11ee-bb3d-a97312b6e99c",
      "name": "Disk Usage",
      "tags": [],
      "enabled": true,
      "consumer": "monitoring",
      "throttle": null,
      "revision": 19,
      "running": false,
      "schedule": {
        "interval": "1m"
      },
      "params": {
        "duration": "5m",
        "filterQuery": "",
        "filterQueryText": "",
        "threshold": 80
      },
      "rule_type_id": "monitoring_alert_disk_usage",
      "created_by": "1685651753",
      "updated_by": "elastic",
      "created_at": "2023-11-16T18:37:43.710Z",
      "updated_at": "2023-12-07T22:45:34.655Z",
      "api_key_owner": "elastic",
      "notify_when": "onThrottleInterval",
      "mute_all": false,
      "muted_alert_ids": [],
      "scheduled_task_id": "3a3d1950-84af-11ee-bb3d-a97312b6e99c",
      "execution_status": {
        "status": "active",
        "last_execution_date": "2023-12-07T22:44:51.497Z",
        "last_duration": 1189
      },
      "actions": [
        {
          "group": "default",
          "id": "39ae46d0-84af-11ee-bb3d-a97312b6e99c",
          "params": {
            "level": "info",
            "message": "{{context.internalShortMessage}}"
          },
          "connector_type_id": ".server-log",
          "uuid": "b7522351-9cd9-4a80-a921-48f42dc4b169"
        },
        {
          "group": "default",
          "id": "elastic-cloud-email",
          "params": {
            "message": "{{context.internalFullMessage}}",
            "subject": "Elastic - High Disk Usage",
            "to": [
              "XXXX"
            ]
          },
          "connector_type_id": ".email",
          "uuid": "2c592cc7-9e7a-4209-a57b-70959f0776d9"
        }
      ],
      "last_run": {
        "alerts_count": {
          "active": 2,
          "new": 0,
          "recovered": 0,
          "ignored": 0
        },
        "outcome_msg": null,
        "outcome_order": 0,
        "outcome": "succeeded",
        "warning": null
      },
      "next_run": "2023-12-07T22:45:51.426Z",
      "api_key_created_by_user": false
    }

Note now that the action item frequency params are no longer there.

Expected behavior
Import or create the policy to produce a working alerting rule

Screenshots
The kibana error when attempting to save the rule via the kibana dashboard after modification to the rule:
image

Versions (please complete the following information):

  • OS: Linux
  • Terraform Version 1.3.9
  • Provider version : v0.11.0
  • Elasticsearch Version 8.11.1

Additional context
I tried to address this with my limited go knowledge but was unsuccessful. I did get as far as just setting the value for notify_when to null to see what it would do. Elasticsearch then complained that the rule actions were missing their frequency parameters.

@stefnestor
Copy link

stefnestor commented Jan 5, 2024

Cross-posting @jpdjere's comment on previous discussion.

For Alerting-level Rule updates, since Kibana's ResponseOps team finalized elastic/kibana#143368 via elastic/kibana#144130 the Rule-level notify_when column is no longer required as the data can be populated under the action.frequency request JSON. (Noting that IF Rule-level is included it omits the Action-level regardless of presence, but if the Action-level is included and Rule-level not, the Kibana API no longer errors in more recent version. Terraform's code should be updated to reflect this new reality ≥v8.7.0.)

@pmuellr
Copy link
Member

pmuellr commented Jan 8, 2024

As a kind of meta-question on this, it's not clear how this is tested, but feels like we should do some kind of E2E testing at least once per release (first bc?) to make sure API changes have not broken this provider.

If there is already some kind of test like this, we apparently need more, as we should have been able to catch this.

If there isn't, we should build one.

@tobio
Copy link
Member

tobio commented Jan 9, 2024

There's acceptance tests for this resource which are run against a range of stack versions.

The existing tests do explicitly check the notify_when property. IIUC this issue correctly, they're passing since they don't configure an action on the alerting rule.

@Kushmaro
Copy link
Contributor

@tobio IIUC this is purely a TF issue now? (initially looked like a Kibana API issue?)

@ghost
Copy link

ghost commented Apr 12, 2024

@tobio IIUC this is purely a TF issue now? (initially looked like a Kibana API issue?)

No, it wasn't. After the upgrade to the version, it's working now. Thanks

@tobio
Copy link
Member

tobio commented Apr 12, 2024

IIRC the API spec has changed dramatically since these resources were introduced. We likely want to regenerate the client and decide on how much effort we put into supporting early API version here too.

@Kushmaro
Copy link
Contributor

@tobio so the path here is basically creating a new resource from the newer sec IIUC?

@tobio
Copy link
Member

tobio commented Apr 29, 2024

I think it's regenerating the client and making the current resource work with the latest version. There's potentially some version restrictions tie figure out (e.g ES 8.8- requires the current provider version), but we'd have to look at the spec changes to know.

@bartoszcisek
Copy link

Thanks for looking into this topic. @tobio Did you maybe managed to regenerate the client?

I'm really looking forward to use this resource with latest Elastic Cloud version.

@bartoszcisek
Copy link

@tobio Did you maybe have time to take look into this issue?

@tobio
Copy link
Member

tobio commented Jun 21, 2024

I've only been able to look at this to verify that it's not simply an hours work and that there's a bunch of breaking changes in the API spec we'd need to adapt to sadly.

@cnasikas
Copy link
Member

cnasikas commented Sep 20, 2024

Fixed by elastic/kibana#186963. Starting from v0.11.7, the Rule resource now supports the rule's alert_delay property and the rule's action alerts_filter and frequency properties. You can find complete documentation on the Elastic Terraform provider documentation page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Development

No branches or pull requests

9 participants