-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate using context
variables instead of state
for Uptime rules
#126280
Comments
Pinging @elastic/uptime (Team:uptime) |
We have identified three issues related to this |
Closing this in favour of the new issues @dominiqueclarke has created |
@ymao1 I'd like to revisit Uptime state vs context variables. We have begun the process of copying alert state into alert context. However when beginning the process of specifying alert recovery context, I realized that persisting values in state works well for Uptime. Uptime rules can apply to 1 or many Uptime monitors, trigger 1 or many alerts. At the time the rule is created, key information about the monitor is not specified, only a query to fetch the relevant monitors. When the rule is executed, we find all the monitors matching the threshold and query definitions, and pull in critical information such as the monitor url and location to pass to the alert for the alert connector message. Without storing this critical information in state, we'd have to run this query again for all recovered alerts, to fetch the monitor url, location, and more once again. At a minimum, we'd need to store a key, Could this be a valid reason to continue holding Uptime variables in state, as well as context? By holding the variables in state, we can call |
@dominiqueclarke There's definitely a trade-off between calculating this information one time and persisting and calculating it as needed. The framework is primarily concerned with the potential size of the task manager document that includes serialized state from possibly a lot of Uptime alerts. However, it doesn't look like this is an issue we've seen crop up at this point and we are also considering methods of capping number of alerts at a framework level. Given this, I think it's fine to backlog removing state variables altogether but I think we should move forward with copying the state variables into context for Uptime rules as we would like to make LMK if that makes sense! |
That makes a lot of sense. Thank you. I will put you on my PR once I implement alert recovery to ensure we're on the same page. |
The alerting framework has recently merged a PR to provide rule executors with the ability to specify context for recovered alerts. Details are available in the PR and in the alerting README.
While investigating the issue, we noticed that Uptime rules almost exclusively use
state
variables vscontext
variables and we would like to work with the team to explore the possibility of moving thesestate
variables tocontext
.State
variables are serialized and persisted inside the task manager document so storing a lot of information insidestate
when a rule spawns many alerts can bloat the size of the task manager document. The initial intent forstate
variables was to persist information from one rule execution that is then used inside a subsequent rule execution. For fairly static information,context
variables are generally a better choice.In addition, we've seen many users asking for the ability to specify information in their recovery notifications. While we now offer rules the ability to specify
context
variables for recovered alerts, we do not and do not plan to offer the ability to specifystate
variables for recovered alerts. Moving tocontext
variables where possible and then utilizing the recovered alert services will satisfy a widely requested feature.The text was updated successfully, but these errors were encountered: