Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate using context variables instead of state for Uptime rules #126280

Closed
ymao1 opened this issue Feb 23, 2022 · 6 comments
Closed

Investigate using context variables instead of state for Uptime rules #126280

ymao1 opened this issue Feb 23, 2022 · 6 comments
Labels
Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability

Comments

@ymao1
Copy link
Contributor

ymao1 commented Feb 23, 2022

The alerting framework has recently merged a PR to provide rule executors with the ability to specify context for recovered alerts. Details are available in the PR and in the alerting README.

While investigating the issue, we noticed that Uptime rules almost exclusively use state variables vs context variables and we would like to work with the team to explore the possibility of moving these state variables to context.

State variables are serialized and persisted inside the task manager document so storing a lot of information inside state when a rule spawns many alerts can bloat the size of the task manager document. The initial intent for state variables was to persist information from one rule execution that is then used inside a subsequent rule execution. For fairly static information, context variables are generally a better choice.

In addition, we've seen many users asking for the ability to specify information in their recovery notifications. While we now offer rules the ability to specify context variables for recovered alerts, we do not and do not plan to offer the ability to specify state variables for recovered alerts. Moving to context variables where possible and then utilizing the recovered alert services will satisfy a widely requested feature.

@botelastic botelastic bot added the needs-team Issues missing a team label label Feb 23, 2022
@ymao1 ymao1 added the Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability label Feb 23, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/uptime (Team:uptime)

@dominiqueclarke
Copy link
Contributor

@paulb-elastic
Copy link
Contributor

Closing this in favour of the new issues @dominiqueclarke has created

@dominiqueclarke
Copy link
Contributor

@ymao1 I'd like to revisit Uptime state vs context variables.

We have begun the process of copying alert state into alert context.

However when beginning the process of specifying alert recovery context, I realized that persisting values in state works well for Uptime. Uptime rules can apply to 1 or many Uptime monitors, trigger 1 or many alerts. At the time the rule is created, key information about the monitor is not specified, only a query to fetch the relevant monitors. When the rule is executed, we find all the monitors matching the threshold and query definitions, and pull in critical information such as the monitor url and location to pass to the alert for the alert connector message.

Without storing this critical information in state, we'd have to run this query again for all recovered alerts, to fetch the monitor url, location, and more once again. At a minimum, we'd need to store a key, monitor.id, in state in order to perform this query.

Could this be a valid reason to continue holding Uptime variables in state, as well as context? By holding the variables in state, we can call alert.getState() on the recovered alert and quickly gain access to all critical variables the end-user might need in recovery messages, and assign them to the alert via alert.setContext.

@ymao1
Copy link
Contributor Author

ymao1 commented Apr 20, 2022

@dominiqueclarke There's definitely a trade-off between calculating this information one time and persisting and calculating it as needed. The framework is primarily concerned with the potential size of the task manager document that includes serialized state from possibly a lot of Uptime alerts. However, it doesn't look like this is an issue we've seen crop up at this point and we are also considering methods of capping number of alerts at a framework level.

Given this, I think it's fine to backlog removing state variables altogether but I think we should move forward with copying the state variables into context for Uptime rules as we would like to make context the user facing action variables and remove state variables from being shown in the UI altogether.

LMK if that makes sense!

@dominiqueclarke
Copy link
Contributor

@ymao1

That makes a lot of sense. Thank you. I will put you on my PR once I implement alert recovery to ensure we're on the same page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability
Projects
None yet
Development

No branches or pull requests

4 participants