-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
422 Unprocessable Entity on random queries #687
Comments
Hmm, that is curious -- haven't seen that error before. Would it be possible to get debug or trace logs? Alternatively a tcpdump would be great (but a bit hard to anonymize). The grafana side error says "error in servergroup ord=0" which really sounds like promxy returned an error (422 or 500 -- unclear since they disagree). If we could get some logs or tcpdumps from promxy we could get some better idea of what happened there. A 422 code means the request is bad -- so maybe there is some encoding or escaping issue with the subqueries being sent to the downstreams (maybe promxy is getting 422s from downstream and returning a 500)? |
@jacksontj I will enable trace logs next week and I will share them here. |
@jacksontj here you have a small portion of logs from Promxy and also Mimir which I can take with a query that has an error. We are using the following versions:
I'm also trying to debug the issue, but if you need anything which can help please let me know Promxy logs: {"level":"debug","msg":"BinaryExpr: count by(provider, region, role) (cortex_ruler_managers_total{stack=~\"a|b\"}) \u003e 2","time":"2024-10-04T10:36:26Z"}
{"level":"debug","msg":"BinaryExpr: min by(provider, region, role, instance) (cortex_ruler_managers_total{role=\"rulers\",stack=~\"a|b\"}) == 0","time":"2024-10-04T10:36:26Z"}
{"level":"debug","msg":"BinaryExpr (AggregateExpr + Literal): min by(provider, region, role, instance) (cortex_ruler_managers_total{role=\"rulers\",stack=~\"a|b\"}) == 0","time":"2024-10-04T10:36:26Z"}
{"level":"debug","msg":"AggregateExpr count by(provider, region, role) (cortex_ruler_managers_total{stack=~\"a|b\"}) count","time":"2024-10-04T10:36:26Z"}
{"api":"Query","level":"debug","msg":"https://mimir.dns.com:8080/prometheus","query":"count by(provider, region, role) (cortex_ruler_managers_total{stack=~\"a|b\"})","time":"2024-10-04T10:36:26Z","ts":"2024-10-04T10:36:20Z"}
{"level":"trace","msg":"Set ServerGroup custom header X-Scope-OrgID: tenant_id","time":"2024-10-04T10:36:26Z"}
{"api":"Query","level":"debug","msg":"https://mimir.dns.com:8080/prometheus","query":"min by(provider, region, role, instance) (cortex_ruler_managers_total{role=\"rulers\",stack=~\"a|b\"}) == 0","time":"2024-10-04T10:36:26Z","ts":"2024-10-04T10:36:20Z"}
{"level":"trace","msg":"Set ServerGroup custom header X-Scope-OrgID: tenant_id","time":"2024-10-04T10:36:26Z"}
{"level":"debug","msg":"BinaryExpr: max by(provider, region, role) (cortex_ruler_managers_total{stack=~\"a|b\"}) \u003e 0","time":"2024-10-04T10:36:26Z"}
{"level":"debug","msg":"BinaryExpr (AggregateExpr + Literal): max by(provider, region, role) (cortex_ruler_managers_total{stack=~\"a|b\"}) \u003e 0","time":"2024-10-04T10:36:26Z"}
{"api":"Query","level":"debug","msg":"https://mimir.dns.com:8080/prometheus","query":"max by(provider, region, role) (cortex_ruler_managers_total{stack=~\"a|b\"}) \u003e 0","time":"2024-10-04T10:36:26Z","ts":"2024-10-04T10:36:20Z"}
{"level":"trace","msg":"Set ServerGroup custom header X-Scope-OrgID: tenant_id","time":"2024-10-04T10:36:26Z"}
{"api":"Query","error":"server_error: server error: 500","level":"trace","msg":"https://mimir.dns.com:8080/prometheus","query":"min by(provider, region, role, instance) (cortex_ruler_managers_total{role=\"rulers\",stack=~\"a|b\"}) == 0","time":"2024-10-04T10:36:26Z","took":1002389,"ts":"2024-10-04T10:36:20Z","value":null,"warnings":null}
{"api":"Query","error":"server_error: server error: 500","level":"trace","msg":"https://mimir.dns.com:8080/prometheus","query":"max by(provider, region, role) (cortex_ruler_managers_total{stack=~\"a|b\"}) \u003e 0","time":"2024-10-04T10:36:26Z","took":1043217,"ts":"2024-10-04T10:36:20Z","value":null,"warnings":null}
{"api":"Query","error":null,"level":"trace","msg":"https://mimir.dns.com:8080/prometheus","query":"count by(provider, region, role) (cortex_ruler_managers_total{stack=~\"a|b\"})","time":"2024-10-04T10:36:26Z","took":2478969,"ts":"2024-10-04T10:36:20Z","value":[{"metric":{"provider":"openstack","region":"scl","role":"rulers"},"value":[1728038180,"2"]}],"warnings":null}
{"remoteAddr":"10.26.50.231","time":"2024-10-04T10:36:26.284953771Z","method":"POST","path":"/api/v1/query","protocol":"HTTP/1.1","status":422,"responseBytes":191,"duration":0.004576245,"query":"query=%23+Alert+on+ruler+instances+in+microservices+mode+that+have+no+rule+groups+assigned%2C%0Amin+by%28provider%2C+region%2C+role%2C+instance%29+%28cortex_ruler_managers_total%7Bstack%3D~%22a%7Cb%22%2Crole%3D%22rulers%22%7D%29+%3D%3D+0%0A"} Mimir Logs: {"caller":"handler.go:382","component":"query-frontend","err":"rpc error: code = Unavailable desc = transport is closing","estimated_series_count":0,"fetched_chunk_bytes":0,"fetched_chunks_count":0,"fetched_index_bytes":0,"fetched_series_count":0,"file":"/var/log/mimir/mimir.log","headers":{},"instance":"mimir-queriers-002","jurisdiction":"us","length":"1h5m0s","level":"info","message":"query stats","message_key":null,"method":"POST","offset":40475661,"param_end":"2024-10-04T10:41:00Z","param_query":"min by(provider, region, role, instance) (cortex_ruler_managers_total{role=\"rulers\",stack=~\"a|b\"}) == 0","param_start":"2024-10-04T09:41:00Z","param_step":"30000","partition":0,"path":"/prometheus/api/v1/query_range","provider":"cloud","query_wall_time_seconds":0,"queue_time_seconds":0,"region":"region","response_size_bytes":0,"response_time":"909.115µs","results_cache_hit_bytes":1070,"results_cache_miss_bytes":0,"role":"queriers","route_name":"prometheus_api_v1_query_range","service":"mimir","sharded_queries":0,"source_type":"kafka","split_queries":1,"stack":"stack","status":"failed","status_code":0,"time_since_max_time":"11.06988766s","time_since_min_time":"1h5m11.06988766s","topic":"logs--generic","user":"tenantd_id","user_agent":"Go-http-client/1.1"}
{"caller":"handler.go:382","component":"query-frontend","err":"rpc error: code = Unavailable desc = transport is closing","estimated_series_count":0,"fetched_chunk_bytes":0,"fetched_chunks_count":0,"fetched_index_bytes":0,"fetched_series_count":0,"file":"/var/log/mimir/mimir.log","headers":{},"instance":"mimir-queriers-001","jurisdiction":"us","length":"5m0s","level":"info","message":"query stats","message_key":null,"method":"POST","offset":40467566,"param_query":"max by(provider, region, role) (cortex_ruler_managers_total{stack=~\"a|b\"}) > 0","param_time":"2024-10-04T10:36:20Z","partition":1,"path":"/prometheus/api/v1/query","provider":"cloud","query_wall_time_seconds":0,"queue_time_seconds":0,"region":"region","response_size_bytes":0,"response_time":"408.708µs","results_cache_hit_bytes":0,"results_cache_miss_bytes":0,"role":"queriers","route_name":"prometheus_api_v1_query","service":"mimir","sharded_queries":0,"source_type":"kafka","split_queries":0,"stack":"stack","status":"failed","status_code":0,"time_since_max_time":"6.282686245s","time_since_min_time":"5m6.282686245s","topic":"logs--generic","user":"tenantd_id","user_agent":"Go-http-client/1.1"}
{"caller":"spanlogger.go:109","err":"rpc error: code = Unavailable desc = transport is closing","file":"/var/log/mimir/mimir.log","headers":{},"instance":"mimir-queriers-001","jurisdiction":"us","level":"error","message":"error processing request","message_key":null,"offset":40467549,"partition":1,"provider":"cloud","query":"min by (provider, region, role, instance) (cortex_ruler_managers_total{role=\"rulers\",stack=~\"a|b\"}) == 0","query_timestamp":1728038180000,"region":"region","role":"queriers","service":"mimir","source_type":"kafka","stack":"stack","topic":"logs--generic","try":2,"user":"tenantd_id"}
{"caller":"spanlogger.go:109","err":"rpc error: code = Unavailable desc = transport is closing","file":"/var/log/mimir/mimir.log","headers":{},"instance":"mimir-queriers-001","jurisdiction":"us","level":"error","message":"error processing request","message_key":null,"offset":40467565,"partition":1,"provider":"cloud","query":"max by (provider, region, role) (cortex_ruler_managers_total{stack=~\"a|b\"}) > 0","query_timestamp":1728038180000,"region":"region","role":"queriers","service":"mimir","source_type":"kafka","stack":"stack","topic":"logs--generic","try":4,"user":"tenantd_id"}
{"caller":"spanlogger.go:109","err":"rpc error: code = Unavailable desc = transport is closing","file":"/var/log/mimir/mimir.log","headers":{},"instance":"mimir-queriers-001","jurisdiction":"us","level":"error","message":"error processing request","message_key":null,"offset":40467547,"partition":1,"provider":"cloud","query":"min by (provider, region, role, instance) (cortex_ruler_managers_total{role=\"rulers\",stack=~\"a|b\"}) == 0","query_timestamp":1728038180000,"region":"region","role":"queriers","service":"mimir","source_type":"kafka","stack":"stack","topic":"logs--generic","try":1,"user":"tenantd_id"}
{"caller":"spanlogger.go:109","err":"rpc error: code = Unavailable desc = transport is closing","file":"/var/log/mimir/mimir.log","headers":{},"instance":"mimir-queriers-001","jurisdiction":"us","level":"error","message":"error processing request","message_key":null,"offset":40467553,"partition":1,"provider":"cloud","query":"min by (provider, region, role, instance) (cortex_ruler_managers_total{role=\"rulers\",stack=~\"a|b\"}) == 0","query_timestamp":1728038180000,"region":"region","role":"queriers","service":"mimir","source_type":"kafka","stack":"stack","topic":"logs--generic","try":4,"user":"tenantd_id"}
{"caller":"handler.go:382","component":"query-frontend","err":"rpc error: code = Unavailable desc = transport is closing","estimated_series_count":0,"fetched_chunk_bytes":0,"fetched_chunks_count":0,"fetched_index_bytes":0,"fetched_series_count":0,"file":"/var/log/mimir/mimir.log","headers":{},"instance":"mimir-queriers-001","jurisdiction":"us","length":"5m0s","level":"info","message":"query stats","message_key":null,"method":"POST","offset":40467554,"param_query":"min by(provider, region, role, instance) (cortex_ruler_managers_total{role=\"rulers\",stack=~\"a|b\"}) == 0","param_time":"2024-10-04T10:36:20Z","partition":1,"path":"/prometheus/api/v1/query","provider":"cloud","query_wall_time_seconds":0,"queue_time_seconds":0,"region":"region","response_size_bytes":0,"response_time":"484.378µs","results_cache_hit_bytes":0,"results_cache_miss_bytes":0,"role":"queriers","route_name":"prometheus_api_v1_query","service":"mimir","sharded_queries":0,"source_type":"kafka","split_queries":0,"stack":"stack","status":"failed","status_code":0,"time_since_max_time":"6.28200176s","time_since_min_time":"5m6.28200176s","topic":"logs--generic","user":"tenantd_id","user_agent":"Go-http-client/1.1"}
{"caller":"spanlogger.go:109","err":"rpc error: code = Unavailable desc = transport is closing","file":"/var/log/mimir/mimir.log","headers":{},"instance":"mimir-queriers-001","jurisdiction":"us","level":"error","message":"error processing request","message_key":null,"offset":40467563,"partition":1,"provider":"cloud","query":"max by (provider, region, role) (cortex_ruler_managers_total{stack=~\"a|b\"}) > 0","query_timestamp":1728038180000,"region":"region","role":"queriers","service":"mimir","source_type":"kafka","stack":"stack","topic":"logs--generic","try":3,"user":"tenantd_id"}
{"caller":"spanlogger.go:109","err":"rpc error: code = Unavailable desc = transport is closing","file":"/var/log/mimir/mimir.log","headers":{},"instance":"mimir-queriers-001","jurisdiction":"us","level":"error","message":"error processing request","message_key":null,"offset":40467559,"partition":1,"provider":"cloud","query":"max by (provider, region, role) (cortex_ruler_managers_total{stack=~\"a|b\"}) > 0","query_timestamp":1728038180000,"region":"region","role":"queriers","service":"mimir","source_type":"kafka","stack":"stack","topic":"logs--generic","try":1,"user":"tenantd_id"}
{"caller":"spanlogger.go:109","err":"rpc error: code = Unavailable desc = transport is closing","file":"/var/log/mimir/mimir.log","headers":{},"instance":"mimir-queriers-001","jurisdiction":"us","level":"error","message":"error processing request","message_key":null,"offset":40467545,"partition":1,"provider":"cloud","query":"min by (provider, region, role, instance) (cortex_ruler_managers_total{role=\"rulers\",stack=~\"a|b\"}) == 0","query_timestamp":1728038180000,"region":"region","role":"queriers","service":"mimir","source_type":"kafka","stack":"stack","topic":"logs--generic","try":0,"user":"tenantd_id"}
{"caller":"spanlogger.go:109","err":"rpc error: code = Unavailable desc = transport is closing","file":"/var/log/mimir/mimir.log","headers":{},"instance":"mimir-queriers-001","jurisdiction":"us","level":"error","message":"error processing request","message_key":null,"offset":40467561,"partition":1,"provider":"cloud","query":"max by (provider, region, role) (cortex_ruler_managers_total{stack=~\"a|b\"}) > 0","query_timestamp":1728038180000,"region":"region","role":"queriers","service":"mimir","source_type":"kafka","stack":"stack","topic":"logs--generic","try":2,"user":"tenantd_id"}
{"caller":"spanlogger.go:109","err":"rpc error: code = Unavailable desc = transport is closing","file":"/var/log/mimir/mimir.log","headers":{},"instance":"mimir-queriers-001","jurisdiction":"us","level":"error","message":"error processing request","message_key":null,"offset":40467557,"partition":1,"provider":"cloud","query":"max by (provider, region, role) (cortex_ruler_managers_total{stack=~\"a|b\"}) > 0","query_timestamp":1728038180000,"region":"region","role":"queriers","service":"mimir","source_type":"kafka","stack":"stack","topic":"logs--generic","try":0,"user":"tenantd_id"}
{"caller":"spanlogger.go:109","err":"rpc error: code = Unavailable desc = transport is closing","file":"/var/log/mimir/mimir.log","headers":{},"instance":"mimir-queriers-001","jurisdiction":"us","level":"error","message":"error processing request","message_key":null,"offset":40467551,"partition":1,"provider":"cloud","query":"min by (provider, region, role, instance) (cortex_ruler_managers_total{role=\"rulers\",stack=~\"a|b\"}) == 0","query_timestamp":1728038180000,"region":"region","role":"queriers","service":"mimir","source_type":"kafka","stack":"stack","topic":"logs--generic","try":3,"user":"tenantd_id"}
{"caller":"spanlogger.go:109","err":"rpc error: code = Unavailable desc = transport is closing","file":"/var/log/mimir/mimir.log","headers":{},"instance":"mimir-queriers-001","jurisdiction":"us","level":"error","message":"error processing request","message_key":null,"offset":40467539,"partition":1,"provider":"cloud","query":"max by (provider, region, role) (cortex_ruler_managers_total{stack=~\"a|b\"}) > 0","query_timestamp":1728038180000,"region":"region","role":"queriers","service":"mimir","source_type":"kafka","stack":"stack","topic":"logs--generic","try":3,"user":"tenantd_id"}
{"caller":"spanlogger.go:109","err":"rpc error: code = Unavailable desc = transport is closing","file":"/var/log/mimir/mimir.log","headers":{},"instance":"mimir-queriers-001","jurisdiction":"us","level":"error","message":"error processing request","message_key":null,"offset":40467533,"partition":1,"provider":"cloud","query":"max by (provider, region, role) (cortex_ruler_managers_total{stack=~\"a|b\"}) > 0","query_timestamp":1728038180000,"region":"region","role":"queriers","service":"mimir","source_type":"kafka","stack":"stack","topic":"logs--generic","try":0,"user":"tenantd_id"}
{"caller":"spanlogger.go:109","err":"rpc error: code = Unavailable desc = transport is closing","file":"/var/log/mimir/mimir.log","headers":{},"instance":"mimir-queriers-001","jurisdiction":"us","level":"error","message":"error processing request","message_key":null,"offset":40467537,"partition":1,"provider":"cloud","query":"max by (provider, region, role) (cortex_ruler_managers_total{stack=~\"a|b\"}) > 0","query_timestamp":1728038180000,"region":"region","role":"queriers","service":"mimir","source_type":"kafka","stack":"stack","topic":"logs--generic","try":2,"user":"tenantd_id"}
{"caller":"spanlogger.go:109","err":"rpc error: code = Unavailable desc = transport is closing","file":"/var/log/mimir/mimir.log","headers":{},"instance":"mimir-queriers-001","jurisdiction":"us","level":"error","message":"error processing request","message_key":null,"offset":40467541,"partition":1,"provider":"cloud","query":"max by (provider, region, role) (cortex_ruler_managers_total{stack=~\"a|b\"}) > 0","query_timestamp":1728038180000,"region":"region","role":"queriers","service":"mimir","source_type":"kafka","stack":"stack","topic":"logs--generic","try":4,"user":"tenantd_id"}
{"caller":"handler.go:382","component":"query-frontend","err":"rpc error: code = Unavailable desc = transport is closing","estimated_series_count":0,"fetched_chunk_bytes":0,"fetched_chunks_count":0,"fetched_index_bytes":0,"fetched_series_count":0,"file":"/var/log/mimir/mimir.log","headers":{},"instance":"mimir-queriers-001","jurisdiction":"us","length":"5m0s","level":"info","message":"query stats","message_key":null,"method":"POST","offset":40467542,"param_query":"max by(provider, region, role) (cortex_ruler_managers_total{stack=~\"a|b\"}) > 0","param_time":"2024-10-04T10:36:20Z","partition":1,"path":"/prometheus/api/v1/query","provider":"cloud","query_wall_time_seconds":0,"queue_time_seconds":0,"region":"region","response_size_bytes":0,"response_time":"534.363µs","results_cache_hit_bytes":0,"results_cache_miss_bytes":0,"role":"queriers","route_name":"prometheus_api_v1_query","service":"mimir","sharded_queries":0,"source_type":"kafka","split_queries":0,"stack":"stack","status":"failed","status_code":0,"time_since_max_time":"6.277494383s","time_since_min_time":"5m6.277494383s","topic":"logs--generic","user":"tenantd_id","user_agent":"Go-http-client/1.1"}
{"caller":"spanlogger.go:109","err":"rpc error: code = Unavailable desc = transport is closing","file":"/var/log/mimir/mimir.log","headers":{},"instance":"mimir-queriers-001","jurisdiction":"us","level":"error","message":"error processing request","message_key":null,"offset":40467535,"partition":1,"provider":"cloud","query":"max by (provider, region, role) (cortex_ruler_managers_total{stack=~\"a|b\"}) > 0","query_timestamp":1728038180000,"region":"region","role":"queriers","service":"mimir","source_type":"kafka","stack":"stack","topic":"logs--generic","try":1,"user":"tenantd_id"} |
Thanks for providing the logs! that actually helps a lot. There are a few things I see off here:
From the logs we can see that promxy got a 500 from the downstream mimir for a query:
Which is at first confusing since we got a 422 -- not a 500. But after looking into the prometheus API handler code we can see that the handler will mark any execution failure as Unprocessible -- which is annoying as it causes this confusion. So for #1 -- this is a design choice from prometheus -- where any processing error is a 422 (even though it could be transient.. so IMO its not correct -- but thats what it is doing).
The last line in the promxy log is:
Which has a query with what seems to be an attempted comment (
I'm not familiar enough with mimir to speculate on why the errors -- but those 500s are why promxy is returning 422s (as they generate an error in processing the response). |
Hi,
I'm seeing some random HTTP 422 in some random queries that come from Grafana. I have done 20+ simultaneous curl requests, and randomly I see those HTTP 422. Even with debug logs enabled, I don't see anything wrong.
What can cause this ? Anything can be improved on the configuration?
On Grafana side I see HTTP 500, which is weird since I don't see it on Promxy logs.
The text was updated successfully, but these errors were encountered: