Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add alertmanager inhibit rules #1925

Merged
merged 41 commits into from
May 17, 2021
Merged

Add alertmanager inhibit rules #1925

merged 41 commits into from
May 17, 2021

Conversation

ijungmann
Copy link
Contributor

@ijungmann ijungmann commented May 4, 2021

Detailed description:
Add inhibit rules to the helm charts to ensure that alerts which could be triggered downstream from another alert are not fired. This will help to keep the alert channel less noisy.

  • Add a new alert to each service to check when pods go down.
  • Add inhibit rules to each values.yaml to logically group alerts together
  • Add new alertmanagerconfig template to handle the new inhibit rules section.
  • Update the mirror-node-common chart to read the new config

Which issue(s) this PR fixes:
Fixes #1813

Special notes for your reviewer:

Checklist

  • Documentation added
  • Tests updated

@codecov
Copy link

codecov bot commented May 4, 2021

Codecov Report

Merging #1925 (4c132ab) into master (126751e) will increase coverage by 0.68%.
The diff coverage is 79.92%.

❗ Current head 4c132ab differs from pull request most recent head c7e6a36. Consider uploading reports for the commit c7e6a36 to get more accurate results
Impacted file tree graph

@@             Coverage Diff              @@
##             master    #1925      +/-   ##
============================================
+ Coverage     87.05%   87.73%   +0.68%     
- Complexity     1744     1822      +78     
============================================
  Files           315      319       +4     
  Lines          7731     8063     +332     
  Branches        740      814      +74     
============================================
+ Hits           6730     7074     +344     
+ Misses          772      744      -28     
- Partials        229      245      +16     
Impacted Files Coverage Δ Complexity Δ
...ra/mirror/monitor/config/MonitorConfiguration.java 0.00% <0.00%> (ø) 0.00 <0.00> (ø)
.../hedera/mirror/monitor/publish/PublishMetrics.java 0.00% <0.00%> (ø) 0.00 <0.00> (ø)
...a/mirror/monitor/publish/TransactionPublisher.java 1.06% <0.00%> (-0.08%) 1.00 <0.00> (ø)
...a/mirror/monitor/subscribe/AbstractSubscriber.java 77.14% <ø> (-20.00%) 3.00 <0.00> (-1.00)
.../mirror/monitor/subscribe/rest/RestSubscriber.java 100.00% <ø> (ø) 22.00 <0.00> (?)
...nitor/subscribe/rest/RestSubscriberProperties.java 83.33% <ø> (ø) 4.00 <0.00> (?)
hedera-mirror-rosetta/cmd/db.go 0.00% <0.00%> (ø) 0.00 <0.00> (ø)
...ra/mirror/monitor/subscribe/SubscribeResponse.java 66.66% <66.66%> (ø) 4.00 <4.00> (?)
hedera-mirror-rosetta/cmd/config.go 66.66% <67.74%> (+66.66%) 0.00 <0.00> (ø)
...irror/monitor/subscribe/grpc/GrpcSubscription.java 70.00% <70.00%> (ø) 15.00 <15.00> (?)
... and 66 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 126751e...c7e6a36. Read the comment docs.

@steven-sheehy steven-sheehy added this to the Mirror 0.34.0 milestone May 5, 2021
@ijungmann ijungmann added enhancement Type: New feature ops Tasks relating to network operations P2 labels May 5, 2021
@ijungmann ijungmann self-assigned this May 5, 2021
@ijungmann ijungmann force-pushed the add_alert_inhibit_rules branch 2 times, most recently from 9c55a4b to c6ed080 Compare May 6, 2021 06:53
@ijungmann ijungmann marked this pull request as ready for review May 6, 2021 06:57
@ijungmann ijungmann requested a review from a team May 6, 2021 06:57
charts/hedera-mirror-grpc/values.yaml Outdated Show resolved Hide resolved
charts/hedera-mirror-grpc/values.yaml Outdated Show resolved Hide resolved
charts/hedera-mirror-grpc/values.yaml Outdated Show resolved Hide resolved
charts/hedera-mirror-grpc/values.yaml Outdated Show resolved Hide resolved
charts/hedera-mirror-importer/values.yaml Outdated Show resolved Hide resolved
charts/hedera-mirror-importer/values.yaml Outdated Show resolved Hide resolved
charts/hedera-mirror-importer/values.yaml Outdated Show resolved Hide resolved
charts/hedera-mirror-importer/values.yaml Outdated Show resolved Hide resolved
Copy link
Collaborator

@xin-hedera xin-hedera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just some minor suggested changes

Comment on lines 6 to 7
labels:
{{ include "hedera-mirror-grpc.labels" . | nindent 4 }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
labels:
{{ include "hedera-mirror-grpc.labels" . | nindent 4 }}
labels: {{ include "hedera-mirror-grpc.labels" . | nindent 4 }}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in all templates.

inhibitRules:
{{- range $name, $rule := omit .Values.alertmanager.inhibitRules "enabled" }}
{{- if $rule.enabled }}
{{- $rule.matches | toYaml | nindent 4}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{{- $rule.matches | toYaml | nindent 4}}
{{- $rule.matches | toYaml | nindent 4 }}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in all templates.

equal:
- namespace
- pod
-
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

charts/hedera-mirror-importer/values.yaml Outdated Show resolved Hide resolved
@@ -141,11 +173,11 @@ readinessProbe:

resources:
limits:
cpu: 500m
memory: 128Mi
cpu: 1.8
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, had these swapped with the importer resources. Fixed now.

@@ -3,20 +3,26 @@ grpc:
ingress:
middleware:
enabled: true
inhibitRules:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved.

charts/hedera-mirror-grpc/values.yaml Outdated Show resolved Hide resolved
charts/hedera-mirror-monitor/values.yaml Outdated Show resolved Hide resolved
charts/hedera-mirror-grpc/values.yaml Show resolved Hide resolved
charts/hedera-mirror-importer/values.yaml Outdated Show resolved Hide resolved
charts/hedera-mirror-importer/values.yaml Outdated Show resolved Hide resolved
charts/hedera-mirror-importer/values.yaml Outdated Show resolved Hide resolved
charts/hedera-mirror-importer/values.yaml Outdated Show resolved Hide resolved
charts/hedera-mirror-importer/values.yaml Outdated Show resolved Hide resolved
charts/hedera-mirror-importer/values.yaml Outdated Show resolved Hide resolved
charts/hedera-mirror-importer/values.yaml Outdated Show resolved Hide resolved
charts/hedera-mirror-importer/values.yaml Outdated Show resolved Hide resolved
charts/hedera-mirror-importer/values.yaml Outdated Show resolved Hide resolved
- sourceMatch:
- name: alertname
regex: true
value: (ImporterParseErrors)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need for a regex since it's just one value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@ijungmann ijungmann requested a review from xin-hedera May 14, 2021 06:34
@@ -110,6 +202,10 @@ prometheusRules:
for: 1m
labels:
severity: critical
application: hedera-mirror-importer
type: BALANCE
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added type to a few of the rules where it can technically be inferred, but needs to be present to group in the inhibit rules without defining a separate rule.

Ian Jungmann added 19 commits May 17, 2021 11:22
Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
@ijungmann
Copy link
Contributor Author

Fixed DCO issue

@sonarcloud
Copy link

sonarcloud bot commented May 17, 2021

@ijungmann ijungmann merged commit 229d112 into master May 17, 2021
@ijungmann ijungmann deleted the add_alert_inhibit_rules branch May 17, 2021 17:32
ijungmann pushed a commit that referenced this pull request May 18, 2021
*  Add a new alert to each service to check when pods are not ready
*  Add inhibit rules to each values.yaml to logically group alerts together
*  Add new alertmanagerconfig template to handle the new inhibit rules section
*  Update the mirror-node-common chart to read the new config

Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
ijungmann pushed a commit that referenced this pull request May 24, 2021
*  Add a new alert to each service to check when pods are not ready
*  Add inhibit rules to each values.yaml to logically group alerts together
*  Add new alertmanagerconfig template to handle the new inhibit rules section
*  Update the mirror-node-common chart to read the new config

Signed-off-by: Ian Jungmann <ian.jungmann@hedera.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Type: New feature ops Tasks relating to network operations P2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Alert dependencies
4 participants