Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

110 telemetry #114

Merged
merged 3 commits into from
Nov 18, 2021
Merged

110 telemetry #114

merged 3 commits into from
Nov 18, 2021

Conversation

thedodd
Copy link
Collaborator

@thedodd thedodd commented Nov 17, 2021

closes #110

todo

  • fix bug in process metrics gathering
  • finish implementing 3 outstanding proc metrics items

Update config pattern of Stream pods to expect METRICS_PORT env var.
This value is then used to create the Prometheus metrics server. Metrics
server uses the global registry.

Update Operator to pass along necessary config for metrics ports &
update Stream StatefulSet pod spec to expose metrics port on the
container.

Add metrics instrumentation for:
- K8s resource watchers and Stream Subscribers.
- Stream current offset (counter).
- Stream subscriber count (gauge).
- Stream subscriber group last offset processed (counter).
- Per Pipeline last offset processed (counter).
- Per Pipeline active instances (gauge).
- Per Pipeline number of stage subscribers (gauge).

Added Prometheus metrics covering:
- Number of leadership changes.
- Leadership state (1.0 == leader, anything else is follower).
- Error counter for K8s resource watchers.
- Process metrics.

Add process metrics registry & collection task for Stream & Operator.
@thedodd thedodd self-assigned this Nov 17, 2021
@thedodd thedodd added the A-telemetry Hadron server telemetry label Nov 17, 2021
@thedodd thedodd added this to In progress in Main Nov 17, 2021
@thedodd thedodd removed this from In progress in Main Nov 17, 2021
@thedodd thedodd force-pushed the 110-telemetry branch 4 times, most recently from 148c3cc to 84facb3 Compare November 18, 2021 02:28
Updated Stream component to spawn the process metrics sampler routine.

Updated Operator component to spawn the process metrics sampler routine.
Also exposing the /metrics & /health endpoints on a standard HTTP (no TLS)
server. This reduces a bit of the overhead in configuring monitoring and
the like.

Hadron Core has been updated to properly compile and expose process
metrics gathering when running on a linux OS.
@thedodd thedodd merged commit 808d787 into main Nov 18, 2021
@thedodd thedodd deleted the 110-telemetry branch November 18, 2021 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-telemetry Hadron server telemetry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Prometheus metrics & endpoint
1 participant