Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ClusterLoader2] Prometheus metrics corrupted #1020

Closed
jprzychodzen opened this issue Feb 5, 2020 · 1 comment · Fixed by #1199
Closed

[ClusterLoader2] Prometheus metrics corrupted #1020

jprzychodzen opened this issue Feb 5, 2020 · 1 comment · Fixed by #1199
Labels
area/clusterloader priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.

Comments

@jprzychodzen
Copy link
Contributor

While running tests with ClusterLoader2, sometimes created snapshot lacks data. Probably, this is related to creating a snapshot of a running disk.

For small tests (~15 minutes long) this can result in missing metrics for specific tests run.

@jkaniuk
Copy link
Contributor

jkaniuk commented Feb 6, 2020

I've just bumped into that as well.
It happens relatively frequently for 100 nodes tests - 25% of runs out of random 8 runs?

Prometheus logs when data is missing entirely:

tsdb msg="last page of the wal is torn, filling it with zeros" segment=/prometheus-db/wal/00000000
tsdb msg="deleting all segments newer than corrupted segment" segment=0

Missing entirely:

Looks good:

~20-40m of data expected:

W0206 08:56:09.385] I0206 08:56:09.380792   13158 prometheus.go:133] Setting up prometheus stack
...
W0206 09:18:25.438] I0206 09:18:25.437107   13158 prometheus.go:178] Tearing down prometheus stack
another run:
W0206 06:55:49.818] I0206 06:55:49.817762   13305 prometheus.go:166] Prometheus stack set up successfully
...
W0206 07:35:12.182] I0206 07:35:12.174525   13305 prometheus.go:178] Tearing down prometheus stack

/area clusterloader
/priority critical-urgent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clusterloader priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants