Investigate using `context` variables instead of `state` for Uptime rules #126280

ymao1 · 2022-02-23T18:45:09Z

The alerting framework has recently merged a PR to provide rule executors with the ability to specify context for recovered alerts. Details are available in the PR and in the alerting README.

While investigating the issue, we noticed that Uptime rules almost exclusively use state variables vs context variables and we would like to work with the team to explore the possibility of moving these state variables to context.

State variables are serialized and persisted inside the task manager document so storing a lot of information inside state when a rule spawns many alerts can bloat the size of the task manager document. The initial intent for state variables was to persist information from one rule execution that is then used inside a subsequent rule execution. For fairly static information, context variables are generally a better choice.

In addition, we've seen many users asking for the ability to specify information in their recovery notifications. While we now offer rules the ability to specify context variables for recovered alerts, we do not and do not plan to offer the ability to specify state variables for recovered alerts. Moving to context variables where possible and then utilizing the recovered alert services will satisfy a widely requested feature.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-02-23T18:46:45Z

Pinging @elastic/uptime (Team:uptime)

dominiqueclarke · 2022-03-29T14:33:19Z

We have identified three issues related to this

paulb-elastic · 2022-03-29T14:46:41Z

Closing this in favour of the new issues @dominiqueclarke has created

dominiqueclarke · 2022-04-19T13:42:54Z

@ymao1 I'd like to revisit Uptime state vs context variables.

We have begun the process of copying alert state into alert context.

However when beginning the process of specifying alert recovery context, I realized that persisting values in state works well for Uptime. Uptime rules can apply to 1 or many Uptime monitors, trigger 1 or many alerts. At the time the rule is created, key information about the monitor is not specified, only a query to fetch the relevant monitors. When the rule is executed, we find all the monitors matching the threshold and query definitions, and pull in critical information such as the monitor url and location to pass to the alert for the alert connector message.

Without storing this critical information in state, we'd have to run this query again for all recovered alerts, to fetch the monitor url, location, and more once again. At a minimum, we'd need to store a key, monitor.id, in state in order to perform this query.

Could this be a valid reason to continue holding Uptime variables in state, as well as context? By holding the variables in state, we can call alert.getState() on the recovered alert and quickly gain access to all critical variables the end-user might need in recovery messages, and assign them to the alert via alert.setContext.

ymao1 · 2022-04-20T12:01:03Z

@dominiqueclarke There's definitely a trade-off between calculating this information one time and persisting and calculating it as needed. The framework is primarily concerned with the potential size of the task manager document that includes serialized state from possibly a lot of Uptime alerts. However, it doesn't look like this is an issue we've seen crop up at this point and we are also considering methods of capping number of alerts at a framework level.

Given this, I think it's fine to backlog removing state variables altogether but I think we should move forward with copying the state variables into context for Uptime rules as we would like to make context the user facing action variables and remove state variables from being shown in the UI altogether.

LMK if that makes sense!

dominiqueclarke · 2022-04-20T18:30:48Z

@ymao1

That makes a lot of sense. Thank you. I will put you on my PR once I implement alert recovery to ensure we're on the same page.

botelastic bot added the needs-team Issues missing a team label label Feb 23, 2022

ymao1 added the Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability label Feb 23, 2022

botelastic bot removed the needs-team Issues missing a team label label Feb 23, 2022

ymao1 mentioned this issue Mar 1, 2022

[Meta][Response Ops] Context on recovered alerts for each rule type #126617

Open

This was referenced Mar 29, 2022

[Uptime] Alerting - Copy Uptime alert state to context #128760

Closed

[Uptime] Specify alert recovery context #128761

Closed

[Uptime] Alerting - Remove Uptime alert state variables from action connectors #128766

Open

dominiqueclarke mentioned this issue Mar 29, 2022

[Synthetics] copy alert state to alert context and implement alert recovery #128693

Merged

paulb-elastic closed this as completed Mar 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate using `context` variables instead of `state` for Uptime rules #126280

Investigate using `context` variables instead of `state` for Uptime rules #126280

ymao1 commented Feb 23, 2022

elasticmachine commented Feb 23, 2022

dominiqueclarke commented Mar 29, 2022

paulb-elastic commented Mar 29, 2022

dominiqueclarke commented Apr 19, 2022

ymao1 commented Apr 20, 2022

dominiqueclarke commented Apr 20, 2022

Investigate using context variables instead of state for Uptime rules #126280

Investigate using context variables instead of state for Uptime rules #126280

Comments

ymao1 commented Feb 23, 2022

elasticmachine commented Feb 23, 2022

dominiqueclarke commented Mar 29, 2022

paulb-elastic commented Mar 29, 2022

dominiqueclarke commented Apr 19, 2022

ymao1 commented Apr 20, 2022

dominiqueclarke commented Apr 20, 2022

Investigate using `context` variables instead of `state` for Uptime rules #126280

Investigate using `context` variables instead of `state` for Uptime rules #126280