-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: collect execution details and output them in slow query log #7302
Conversation
Collect each execution detail in coprocessor request into StatementContext. So we can include them in slow query log.
@zz-jason @shenli @tiancaiamao PTAL |
} | ||
|
||
// String implements the fmt.Stringer interface. | ||
func (d ExecDetails) String() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we take a try to implement fmt.Formatter
way instead of String()
for better performance?
IMHO, the ideal way to log struct is direct transform struct into file system buffer, and we shouldn't struct->string->fmt.Sprintf
. it seems that Formatter
interface is what we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only called once for a slow query, I think we don't need to worry about the performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatter
is not necessary, we can use a string builder for it.
https://godoc.org/strings#Builder
sessionctx/stmtctx/stmtctx.go
Outdated
sc.mu.execDetails.WaitTime += details.WaitTime | ||
sc.mu.execDetails.BackoffTime += details.BackoffTime | ||
sc.mu.execDetails.RequestCount++ | ||
sc.mu.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we need to make execDetails r/w lock-free?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the lock contention is very low.
parts = append(parts, fmt.Sprintf("request_count:%d", d.RequestCount)) | ||
} | ||
if d.TotalKeys > 0 { | ||
parts = append(parts, fmt.Sprintf("total_keys:%d", d.TotalKeys)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does "total_keys" and "processed_keys" mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can explain them in documents later.
} | ||
r.feedback.Update(re.result.GetStartKey(), r.selectResp.OutputCounts) | ||
r.partialCount++ | ||
sc.MergeExecDetails(re.result.GetExecDetails()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to print execution details for every coprocessor request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes there are hundreds of coprocessor requests, then the SLOW_QUERY log would be unreadable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sometime there is a data skew in some key ranges, and this data skew can slow down the coprocessor performance and cause other coprocessor request waits for processing.
if we merge all the coprocessor request for a user query, we can not recognize the data skew.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we maintain a max?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main problem this PR solves is to distinguish culprit queries among SLOW queries.
The total time is enough for this purpose.
LGTM |
LGTM |
/run-all-tests |
/run-all-tests |
Collect each execution detail in coprocessor request into StatementContext.
Output them in slow query log.
What have you changed? (mandatory)
Collect each execution detail in coprocessor request into StatementContext.
Output them in slow query log.
What is the type of the changes? (mandatory)
How has this PR been tested? (mandatory)
manual and unit test
Does this PR affect documentation (docs/docs-cn) update? (mandatory)
YES
Does this PR affect tidb-ansible update? (mandatory)
NO
Does this PR need to be added to the release notes? (mandatory)
YES
release note:
include execution details in slow query log
Refer to a related PR or issue link (optional)
Benchmark result if necessary (optional)
Add a few positive/negative examples (optional)