-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
summarize: add "aggregate" sub-command for analyzing sets of profiles. #129
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. I think all of the feedback I could give is covered by the FIXMEs 🙂
e79bac9
to
ad97fb3
Compare
ef25949
to
ccbbe4c
Compare
Uhh, please ignore the extra commits, they're from rebasing on the |
Not sure if I should put this into the PR description, but these are the current results with #143's
(Should probably not show the "Largest N variances" entries that are also in the "Smallest N variances") |
Do you think any of them should block merging? I think this could be useful to anyone who wants to reproduce the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think any of them should block merging? I think this could be useful to anyone who wants to reproduce the rdpmc experiments we did.
No, I don't think so. Sorry for the delay on this, I wanted to review it again since it's been a while.
.map(|file| ProfilingData::new(&file)) | ||
.collect::<Result<Vec<_>, _>>()?; | ||
|
||
// FIXME(eddyb) return some kind of serializable data structure from `aggregate_profiles`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be nice to have since it would probably be part of any plans to implement the aggregation feature for perf.rlo.
Co-authored-by: Wesley Wiser <wwiser@gmail.com>
For my ongoing
rdpmc
instruction-counting work I've ended up with a lot of profile results (over 700 and there were hundreds I've already removed), that I don't really have any tools to analyze in-depth.The fundamental problem is that e.g.
summarize summarize
will effectively sum together many small intervals between consecutive timestamps/instruction counts, so the distribution of variance in individual samples (of the clock/instruction counter), or the exact sources of large variances, are lost in the summary.This PR seeks to address that by introducing a
summarize aggregate
command specifically for sets of deterministic runs (identical in all respects except the timestamps themselves). Example output:For comparison, this is my best set of runs so far, for instruction counting (with
rdpmc
measuring bothinstructions:u
and hardware interrupts, to account foriret
overcounting, andcpuid
for serialization):This is not the intended final output, just the simplest analysis I could do, that is still somewhat indicative of the quality of the data. To be truly useful in improving that quality, it has to be able to pinpoint to outliers, not just quantify their existence.EDIT: added some very basic indications of sources of single occurrences (and updated the example output above)
EDIT2: streamlined the single-occurrence output and included the parent (or caller) of the two interval endpoints, when it's relevant (i.e. when the interval is part of it, between one call ending and the other starting, not either callee)