-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v0.0.71: unexpected error: runtime error: invalid memory address or nil pointer dereference #451
Comments
Do you happen to have a repro case (e.g. query and config)? A panic like this is definitely unexpected and is a bug. |
It seems to happen on these two rules:
So a query with lots of "or" in it, which means it hits that new parallel execution path. Though running that query a 1000 times would maybe get you 1 repro. :( As for config, here is what we use for each server in
And we have 7 servers like this that promxy checks.
Another thing, promxy is running as 3 pods, and traffic is load balanced between them. Though that shouldn't matter I think, as whole query is being executed on one pod. |
I have spent a bit of time attempting to reproduce and have been unsuccessful so far. I looked and I don't see an obvious nil exception in the parallel child execution; but again I can't repro the issue. Maybe you can reproduce it with some version against the following config (since Its easy to replicate): promxy:
server_groups:
- static_configs:
- targets:
- demo.robustperception.io:9090
- demo.robustperception.io:9090
http_client:
dial_timeout: 1s |
I tried to turn on debug log output, but that did not produce any valuable information.
But cannot find anything else in log. :( |
The issue here is that the promql engine is recovering from the panic to return an error; but not printing out the whole error message: https://github.com/jacksontj/prometheus/blob/master/promql/engine.go#L849 If you are up to a custom build; you can run with this commit (https://github.com/jacksontj/promxy/compare/no_recover?expand=1) which will allow it to panic (so we can capture that backtrace). |
Ran a custom build and finally caught it:
Seems like it is coming somewhere deep from inside prometheus engine though? |
That it is; it looks like something didn't populate with series which is odd. From your report this is new with the latest promxy though; which makes it even more confusing since how we expand series hasn't changed at all. So just to confirm this is new only on this release? If it is maybe I can make some more specific debug messages to see what in the world is going on :/ |
Yes, this is new with 0.0.71. Was not happening with 0.0.70, and started to happen immediately after upgrading promxy to 0.0.71. |
I did some more digging and found a couple race conditions (pushed tmp fixes to that |
I tried new changes, but unfortunately same panic. :( |
I was able to reproduce some other failure with the Walk (presumably the same issue) after the prom dep upgrade (#460) -- so I have reverted the parallel tree walking and will work to re-enable that in #461. Because of that I'll close this one out as this panic will no longer exist on master as of now. |
Hello @jacksontj
Recently installed 0.0.71 from 0.0.70 version, and noticed this error that is returned from promxy about once or twice in 24 hours:
This is coming from vmalert when it connects to promxy to run alert queries. Simply re-running the query doesn't cause an error.
Here is a full output:
I looked at logs in promxy, but there were no errors reported by promxy itself.
Could it be due to parallel execution that was added in 0.71 version?
And thank you for maintaining and keep updating promxy!!
The text was updated successfully, but these errors were encountered: