-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PostingsForMatchersCache: don't create new spans, reduce logging verbosity #575
PostingsForMatchersCache: don't create new spans, reduce logging verbosity #575
Conversation
…osity Creating a new span for each promQL matcher for each query can end up being too verbose to navigate in grafana. This PR replaces the span with a trace log. Overall starting a span was also expensive - according to profiles it's about 9% of the CPU of PostingsForMatchers and it's about 25% of the memory allocations. Adding events is about 2% of the CPU and ~5% of the allocations. In addition to that it removes some log lines in `postingsForMatchersPromise` because I think they just log the same thing multiple times. I also removed some calls to AddEvent to amortize the cost of tracing further.
span.AddEvent("postingsForMatchers succeeded", trace.WithAttributes( | ||
attribute.String("cache_key", key), | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed this event because it should be accounted for in the "returned postings" log line
span.AddEvent("postingsForMatchers failed", trace.WithAttributes( | ||
attribute.String("cache_key", key), | ||
attribute.String("err", err.Error()), | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed because this should be accounted for in the RecordError
calls in PostingsForMatchers
span.AddEvent("interrupting wait on postingsForMatchers promise due to context error", trace.WithAttributes( | ||
attribute.String("err", ctx.Err().Error()), | ||
)) | ||
return nil, ctx.Err() | ||
return nil, fmt.Errorf("interrupting wait on postingsForMatchers promise due to context error: %w", ctx.Err()) | ||
case <-p.done: | ||
// Checking context error is necessary for deterministic tests, | ||
// as channel selection order is random | ||
if ctx.Err() != nil { | ||
span.AddEvent("completed postingsForMatchers promise, but context has error", trace.WithAttributes( | ||
attribute.String("err", ctx.Err().Error()), | ||
)) | ||
return nil, ctx.Err() | ||
return nil, fmt.Errorf("completed postingsForMatchers promise, but context has error: %w", ctx.Err()) | ||
} | ||
if p.err != nil { | ||
span.AddEvent("postingsForMatchers promise completed with error", trace.WithAttributes( | ||
attribute.String("err", p.err.Error()), | ||
)) | ||
return nil, p.err | ||
return nil, fmt.Errorf("postingsForMatchers promise completed with error: %w", p.err) | ||
} | ||
span.AddEvent("postingsForMatchers promise completed successfully") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding events means that the error will be duplicated with the RecordError
call in PostingsForMatchers
. It's also more expensive.
"postingsForMatchers promise completed successfully" should be accounted for in "returned postings"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to fix the tests. I also left a question about the deferred event adding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
I didn't see these spans for ingester queries in Mimir, so created grafana/mimir#8242 |
Follow-up of #553
Creating a new span for each promQL matcher for each query can end up being too verbose to navigate in grafana. This PR replaces the span with a trace log.
Overall starting a span was also expensive - according to profiles it's about 9% of the CPU of PostingsForMatchers and it's about 25% of the memory allocations. Adding events is about 2% of the CPU and ~5% of the allocations.
CPU
CPU Details
Allocations
In addition to that it removes some log lines in
postingsForMatchersPromise
because I think they just log the same thing multiple times. I also removed some calls to AddEvent to amortize the cost of tracing further.