Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hardware performance counter support (via rdpmc). #143

Merged
merged 5 commits into from
Nov 20, 2020

Conversation

eddyb
Copy link
Member

@eddyb eddyb commented Nov 3, 2020

Note: this is a companion to rust-lang/rust#78781, and duplicates some information with it for convenience

Credits

I'd like to start by thanking @alyssais, @cuviper, @edef1c, @glandium, @jix, @Mark-Simulacrum, @m-ou-se, @mystor, @nagisa, @puckipedia, and @yorickvP, for all of their help with testing, and valuable insight and suggestions.
Getting here wouldn't have been possible without you!

(If I've forgotten anyone please let me know, I'm going off memory here, plus some discussion logs)

Summary

This PR adds support for counting hardware events such as "instructions retired" (as opposed to being limited to time measurements), using the rdpmc instruction on x86_64 Linux.
Additionally, as asm! is used, measureme needs to built with nightly Rust, and with features = ["nightly"], for the support to be enabled.

While other OSes may eventually be supported, preliminary research suggests some kind of kernel extension/driver is required to enable this, whereas on Linux any user can profile (at least) their own threads.

Supporting Linux on architectures other than x86_64 should be much easier (provided the hardware supports such performance counters), and was mostly not done due to a lack of readily available test hardware.
That said, 32-bit x86 (aka i686) would be almost trivial to add and test once we land the initial x86_64 version (as all the CPU detection code can be reused).

These new APIs were (backwards-compatibly) added to measureme:

  • counters module holding Counter and all the counters implemented so far
    • attempting to create a counter may return Err if unsupported, but are not #[cfg]'d out at compile-time
  • Counter::by_name creates a Counter based on its name
    • this is only to avoid needing to reimplement this in users of measureme which want to expose the counter names (e.g. rustc via a -Z self-profile-counter flag)
  • Profiler::with_counter taking the Counter to use for event endpoints
    • the existing Profiler::new continues to default to measuring time (i.e. equivalent to Profiler::with_counter(Counter::by_name("wall-time").unwrap()))
    • a Profiler is currently limited to one Counter but long-term we could adjust the binary format to allow including several counters' values in a measureme event

Information about counters (name and units) is included in the JSON blob describing the profile data, which should still allow unupdated tools to read the new profiles (but they would report e.g. instructions as nanoseconds).
The backwards compatibility aspect was more important before the recent breaking change was introduced (for using just one file instead of 3, per profile), though avoiding breaking changes still seems good practice here.

The named counters so far are:

  • wall-time: the existing time measurement
    • name chosen for consistency with perf.rust-lang.org
    • continues to use std::time::Instant for a nanosecond-precision "monotonic clock"
  • instructions:u: the hardware performance counter usually referred to as "Instructions retired"
    • here "retired" (roughly) means "fully executed"
    • the :u suffix is from the Linux perf tool and indicates the counter only runs while userspace code is executing, and therefore counts no kernel instructions
      • see Caveats/Subtracting IRQs for why this isn't entirely true and why instructions-minus-irqs:u should be preferred instead
  • instructions-minus-irqs:u: same as instructions:u, except the count of hardware interrupts ("IRQs" here for brevity) is subtracted
  • instructions-minus-r0420:u: experimental counter, same as instructions-minus-irqs:u but subtracting an undocumented counter (r0420:u) instead of IRQs
    • the rXXXX notation is again from Linux perf, and indicates a "raw" counter, with a hex representation of the low-level counter configuration - this was picked because we still don't really know what it is
    • this only exists for (future) testing and isn't included/used in any comparisons/data we've put together so far
    • see Challenges/Zen's undocumented 420 counter for details on how this counter was found and what it does

Write-up / report

Because of how extensive the full report ended up being, I've kept most of it on hackmd.io, but for convenient access, here are all the sections (with individual links):
(someone suggested I'd make a backup, so here it is on the wayback machine - I'll need to remember to update that if I have to edit the write-up)

}

impl InstructionsMinusIrqs {
const NAME: &'static str = "instructions-minus-irqs:u";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use instructions:u as name here for the real amount of userspace instructions and use something like instructions:u+int as name for the amount of userspace instructions combined with any bogus counted instructions caused by interrupt returns or similar things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do that I'd be against using instructions:u as the name for anything, because perf stat does not do any subtracting, and we used the instructions:u name specifically for perf stat familiarity (which is why perf.rust-lang.org also uses it).

Furthermore, we might want to try and get perf.rust-lang.org to record the right counter and also show instructions-minus-irqs:u, so ideally we'd use different names to tell them apart.

But I agree the current setup is not the most user-friendly, it was just the easiest way to keep things clear while we worked on this.

@ambiso
Copy link

ambiso commented Nov 4, 2020

Hi,

I'm not sure if this is directly relevant to this PR,
and I very well may not be telling anyone anything new.

Still, since I saw the insistence to remove all noise from the measurements I just wanted to make sure, that people are aware of research that suggests many optimizations performed by LLVM on -O3 are for naught when tested with random stack and binary layout[1,2,3].
This would suggest that you would actually not want to simply disable ASLR, but test on every possible layout.

If this is the wrong pull request to post this comment, or you are aware of this research, or that it doesn't matter for what you're trying to achieve, please feel free to ignore this comment, move it, or whatever.

Take care,
ambiso

[1] https://users.cs.northwestern.edu/~robby/courses/322-2013-spring/mytkowicz-wrong-data.pdf
[2] https://youtu.be/r-TLSBdHe1A?t=948
[3] https://github.com/ccurtsinger/stabilizer

@eddyb
Copy link
Member Author

eddyb commented Nov 4, 2020

@ambiso The way ASLR feeds into this is roughly HashMap<*const T, ...> (the equivalent of unordered_map<T*, ...> in C++), i.e. hashing pointer addresses, meaning the choice of addresses affects the order in the hash map, and therefore the performance characteristics, by somewhere around ±0.01%.

I'm not sure how optimizations, or stack layout feeds into it, but we're not (yet) measuring anything like cache effectiveness, the entire problem with "ASLR non-determinism" in our case is our fault for relying on (interned) pointer addresses for cheaper hashing (and avoiding having to store a hash somewhere at all).

EDIT: one smart way to avoid needing to disable ASLR is wrap some HashMap accesses in their own measureme intervals, so that they're subtracted out - this would be much more effort but may have potentially fascinating results.

measureme/src/counters.rs Show resolved Hide resolved
Comment on lines +322 to +325
// FIXME(eddyb) it's probably UB to use regular reads, especially
// from behind `&T`, with the only synchronization being barriers.
// Probably needs atomic reads, and stronger ones at that, for the
// `lock` field, than the fields (which would be `Relaxed`?).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this should be fixed before merging?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not used during actual counter reads, only once at the start, and it's the same as the C code examples, so this is more "theoretical", I think?


/// The max instant timestamp we can represent with the 48 bits available.
/// The max instant counter value we can represent with the 48 bits available.
// FIXME(eddyb) s/TIMESTAMP/COUNT/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a note for me to create issues to clean this up at the next major version.

measureme/src/counters.rs Show resolved Hide resolved
measureme/src/counters.rs Show resolved Hide resolved
@wesleywiser wesleywiser merged commit 1c8c294 into rust-lang:master Nov 20, 2020
@eddyb eddyb deleted the rdpmc branch November 20, 2020 12:47
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 14, 2022
Integrate measureme's hardware performance counter support.

*Note: this is a companion to rust-lang/measureme#143, and duplicates some information with it for convenience*

**(much later) EDIT**: take any numbers with a grain of salt, they may have changed since initial PR open.

## Credits

I'd like to start by thanking `@alyssais,` `@cuviper,` `@edef1c,` `@glandium,` `@jix,` `@Mark-Simulacrum,` `@m-ou-se,` `@mystor,` `@nagisa,` `@puckipedia,` and `@yorickvP,` for all of their help with testing, and valuable insight and suggestions.
Getting here wouldn't have been possible without you!

(If I've forgotten anyone please let me know, I'm going off memory here, plus some discussion logs)

## Summary

This PR adds support to `-Z self-profile` for counting hardware events such as "instructions retired" (as opposed to being limited to time measurements), using the `rdpmc` instruction on `x86_64` Linux.

While other OSes may eventually be supported, preliminary research suggests some kind of kernel extension/driver is required to enable this, whereas on Linux any user can profile (at least) their own threads.

Supporting Linux on architectures other than x86_64 should be much easier (provided the hardware supports such performance counters), and was mostly not done due to a lack of readily available test hardware.
That said, 32-bit `x86` (aka `i686`) would be almost trivial to add and test once we land the initial `x86_64` version (as all the CPU detection code can be reused).

A new flag `-Z self-profile-counter` was added, to control which of the named `measureme` counters is used, and which defaults to `wall-time`, in order to keep `-Z self-profile`'s current functionality unchanged (at least for now).

The named counters so far are:
* `wall-time`: the existing time measurement
    * name chosen for consistency with `perf.rust-lang.org`
    * continues to use `std::time::Instant` for a nanosecond-precision "monotonic clock"
* `instructions:u`: the hardware performance counter usually referred to as "Instructions retired"
    * here "retired" (roughly) means "fully executed"
    * the `:u` suffix is from the Linux `perf` tool and indicates the counter only runs while userspace code is executing, and therefore counts no kernel instructions
        * *see [Caveats/Subtracting IRQs](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Subtracting-IRQs) for why this isn't entirely true and why `instructions-minus-irqs:u` should be preferred instead*
* `instructions-minus-irqs:u`: same as `instructions:u`, except the count of hardware interrupts ("IRQs" here for brevity) is subtracted
    * *see [Caveats/Subtracting IRQs](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Subtracting-IRQs) for why this should be preferred over `instructions:u`*
* `instructions-minus-r0420:u`: experimental counter, same as `instructions-minus-irqs:u` but subtracting an undocumented counter (`r0420:u`) instead of IRQs
    * the `rXXXX` notation is again from Linux `perf`, and indicates a "raw" counter, with a hex representation of the low-level counter configuration - this was picked because we still don't *really* know what it is
    * this only exists for (future) testing and isn't included/used in any comparisons/data we've put together so far
    * *see [Challenges/Zen's undocumented 420 counter](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Epilogue-Zen’s-undocumented-420-counter) for details on how this counter was found and what it does*

---

There are also some additional commits:
* ~~see [Challenges/Rebasing *shouldn't* affect the results, right?](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Rebasing-*shouldn’t*-affect-the-results,-right) for details on the changes to `rustc_parse` and `rustc_trait_section` (the latter far more dubious, and probably shouldn't be merged, or not as-is)~~
  *  **EDIT**: the effects of these are no long quantifiable, the PR includes reverts for them
* ~~see [Challenges/`jemalloc`: purging will commence in ten seconds](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#jemalloc-purging-will-commence-in-ten-seconds) for details on the `jemalloc` change~~
  * this is also separately found in rust-lang#77162, and we probably want to avoid doing it by default, ideally we'd use the runtime control API `jemalloc` offers (assuming that can stop the timer that's already running, which I'm not sure about)
  * **EDIT**: until we can do this based on `-Z` flags, this commit has also been reverted
* the `proc_macro` change was to avoid randomized hashing and therefore ASLR-like effects

---

**(much later) EDIT**: take any numbers with a grain of salt, they may have changed since initial PR open.

#### Write-up / report

Because of how extensive the full report ended up being, I've kept most of it [on `hackmd.io`](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view), but for convenient access, here are all the sections (with individual links):
<sup>(someone suggested I'd make a backup, so [here it is on the wayback machine](http://web.archive.org/web/20201127164748/https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view) - I'll need to remember to update that if I have to edit the write-up)</sup>

* [**Motivation**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Motivation)

* [**Results**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Results)
    * [**Overhead**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Overhead)
    *Preview (see the report itself for more details):*

    |Counter|Total<br>`instructions-minus-irqs:u`|Overhead from "Baseline"<br>(for all 1903881<br>counter reads)|Overhead from "Baseline"<br>(per each counter read)|
    |-|-|-|-|
    |Baseline|63637621286 ±6||
    |`instructions:u`|63658815885 ±2|&nbsp;&nbsp;+21194599 ±8|&nbsp;&nbsp;+11|
    |`instructions-minus-irqs:u`|63680307361 ±13|&nbsp;&nbsp;+42686075 ±19|&nbsp;&nbsp;+22|
    |`wall-time`|63951958376 ±10275|+314337090 ±10281|+165|

    * [**"Macro" noise (self time)**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#“Macro”-noise-(self-time))
    *Preview (see the report itself for more details):*

    || `wall-time` (ns) | `instructions:u` | `instructions-minus-irqs:u`
    -: | -: | -: | -:
    `typeck` | 5478261360 ±283933373 (±~5.2%) | 17350144522 ±6392 (±~0.00004%) | 17351035832.5 ±4.5 (±~0.00000003%)
    `expand_crate` | 2342096719 ±110465856 (±~4.7%) | 8263777916 ±2937 (±~0.00004%) | 8263708389 ±0 (±~0%)
    `mir_borrowck` | 2216149671 ±119458444 (±~5.4%) | 8340920100 ±2794 (±~0.00003%) | 8341613983.5 ±2.5 (±~0.00000003%)
    `mir_built` | 1269059734 ±91514604 (±~7.2%) | 4454959122 ±1618 (±~0.00004%) | 4455303811 ±1 (±~0.00000002%)
    `resolve_crate` | 942154987.5 ±53068423.5 (±~5.6%) | 3951197709 ±39 (±~0.000001%) | 3951196865 ±0 (±~0%)

    * [**"Micro" noise (individual sampling intervals)**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#“Micro”-noise-(individual-sampling-intervals))

* [**Caveats**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Caveats)
    * [**Disabling ASLR**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Disabling-ASLR)
    * [**Non-deterministic proc macros**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Non-deterministic-proc-macros)
    * [**Subtracting IRQs**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Subtracting-IRQs)
    * [**Lack of support for multiple threads**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Lack-of-support-for-multiple-threads)

* [**Challenges**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Challenges)
    * [**How do we even read hardware performance counters?**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#How-do-we-even-read-hardware-performance-counters)
    * [**ASLR: it's free entropy**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#ASLR-it’s-free-entropy)
    * [**The serializing instruction**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#The-serializing-instruction)
    * [**Getting constantly interrupted**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Getting-constantly-interrupted)
    * [**AMD patented time-travel and dubbed it `SpecLockMap`<br><sup>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;or: "how we accidentally unlocked `rr` on AMD Zen"</sup>**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#AMD-patented-time-travel-and-dubbed-it-SpecLockMapnbspnbspnbspnbspnbspnbspnbspnbspor-“how-we-accidentally-unlocked-rr-on-AMD-Zen”)
    * [**`jemalloc`: purging will commence in ten seconds**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#jemalloc-purging-will-commence-in-ten-seconds)
    * [**Rebasing *shouldn't* affect the results, right?**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Rebasing-*shouldn’t*-affect-the-results,-right)
    * [**Epilogue: Zen's undocumented 420 counter**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Epilogue-Zen’s-undocumented-420-counter)
workingjubilee pushed a commit to tcdi/postgrestd that referenced this pull request Sep 15, 2022
Integrate measureme's hardware performance counter support.

*Note: this is a companion to rust-lang/measureme#143, and duplicates some information with it for convenience*

**(much later) EDIT**: take any numbers with a grain of salt, they may have changed since initial PR open.

## Credits

I'd like to start by thanking `@alyssais,` `@cuviper,` `@edef1c,` `@glandium,` `@jix,` `@Mark-Simulacrum,` `@m-ou-se,` `@mystor,` `@nagisa,` `@puckipedia,` and `@yorickvP,` for all of their help with testing, and valuable insight and suggestions.
Getting here wouldn't have been possible without you!

(If I've forgotten anyone please let me know, I'm going off memory here, plus some discussion logs)

## Summary

This PR adds support to `-Z self-profile` for counting hardware events such as "instructions retired" (as opposed to being limited to time measurements), using the `rdpmc` instruction on `x86_64` Linux.

While other OSes may eventually be supported, preliminary research suggests some kind of kernel extension/driver is required to enable this, whereas on Linux any user can profile (at least) their own threads.

Supporting Linux on architectures other than x86_64 should be much easier (provided the hardware supports such performance counters), and was mostly not done due to a lack of readily available test hardware.
That said, 32-bit `x86` (aka `i686`) would be almost trivial to add and test once we land the initial `x86_64` version (as all the CPU detection code can be reused).

A new flag `-Z self-profile-counter` was added, to control which of the named `measureme` counters is used, and which defaults to `wall-time`, in order to keep `-Z self-profile`'s current functionality unchanged (at least for now).

The named counters so far are:
* `wall-time`: the existing time measurement
    * name chosen for consistency with `perf.rust-lang.org`
    * continues to use `std::time::Instant` for a nanosecond-precision "monotonic clock"
* `instructions:u`: the hardware performance counter usually referred to as "Instructions retired"
    * here "retired" (roughly) means "fully executed"
    * the `:u` suffix is from the Linux `perf` tool and indicates the counter only runs while userspace code is executing, and therefore counts no kernel instructions
        * *see [Caveats/Subtracting IRQs](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Subtracting-IRQs) for why this isn't entirely true and why `instructions-minus-irqs:u` should be preferred instead*
* `instructions-minus-irqs:u`: same as `instructions:u`, except the count of hardware interrupts ("IRQs" here for brevity) is subtracted
    * *see [Caveats/Subtracting IRQs](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Subtracting-IRQs) for why this should be preferred over `instructions:u`*
* `instructions-minus-r0420:u`: experimental counter, same as `instructions-minus-irqs:u` but subtracting an undocumented counter (`r0420:u`) instead of IRQs
    * the `rXXXX` notation is again from Linux `perf`, and indicates a "raw" counter, with a hex representation of the low-level counter configuration - this was picked because we still don't *really* know what it is
    * this only exists for (future) testing and isn't included/used in any comparisons/data we've put together so far
    * *see [Challenges/Zen's undocumented 420 counter](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Epilogue-Zen’s-undocumented-420-counter) for details on how this counter was found and what it does*

---

There are also some additional commits:
* ~~see [Challenges/Rebasing *shouldn't* affect the results, right?](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Rebasing-*shouldn’t*-affect-the-results,-right) for details on the changes to `rustc_parse` and `rustc_trait_section` (the latter far more dubious, and probably shouldn't be merged, or not as-is)~~
  *  **EDIT**: the effects of these are no long quantifiable, the PR includes reverts for them
* ~~see [Challenges/`jemalloc`: purging will commence in ten seconds](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#jemalloc-purging-will-commence-in-ten-seconds) for details on the `jemalloc` change~~
  * this is also separately found in #77162, and we probably want to avoid doing it by default, ideally we'd use the runtime control API `jemalloc` offers (assuming that can stop the timer that's already running, which I'm not sure about)
  * **EDIT**: until we can do this based on `-Z` flags, this commit has also been reverted
* the `proc_macro` change was to avoid randomized hashing and therefore ASLR-like effects

---

**(much later) EDIT**: take any numbers with a grain of salt, they may have changed since initial PR open.

#### Write-up / report

Because of how extensive the full report ended up being, I've kept most of it [on `hackmd.io`](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view), but for convenient access, here are all the sections (with individual links):
<sup>(someone suggested I'd make a backup, so [here it is on the wayback machine](http://web.archive.org/web/20201127164748/https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view) - I'll need to remember to update that if I have to edit the write-up)</sup>

* [**Motivation**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Motivation)

* [**Results**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Results)
    * [**Overhead**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Overhead)
    *Preview (see the report itself for more details):*

    |Counter|Total<br>`instructions-minus-irqs:u`|Overhead from "Baseline"<br>(for all 1903881<br>counter reads)|Overhead from "Baseline"<br>(per each counter read)|
    |-|-|-|-|
    |Baseline|63637621286 ±6||
    |`instructions:u`|63658815885 ±2|&nbsp;&nbsp;+21194599 ±8|&nbsp;&nbsp;+11|
    |`instructions-minus-irqs:u`|63680307361 ±13|&nbsp;&nbsp;+42686075 ±19|&nbsp;&nbsp;+22|
    |`wall-time`|63951958376 ±10275|+314337090 ±10281|+165|

    * [**"Macro" noise (self time)**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#“Macro”-noise-(self-time))
    *Preview (see the report itself for more details):*

    || `wall-time` (ns) | `instructions:u` | `instructions-minus-irqs:u`
    -: | -: | -: | -:
    `typeck` | 5478261360 ±283933373 (±~5.2%) | 17350144522 ±6392 (±~0.00004%) | 17351035832.5 ±4.5 (±~0.00000003%)
    `expand_crate` | 2342096719 ±110465856 (±~4.7%) | 8263777916 ±2937 (±~0.00004%) | 8263708389 ±0 (±~0%)
    `mir_borrowck` | 2216149671 ±119458444 (±~5.4%) | 8340920100 ±2794 (±~0.00003%) | 8341613983.5 ±2.5 (±~0.00000003%)
    `mir_built` | 1269059734 ±91514604 (±~7.2%) | 4454959122 ±1618 (±~0.00004%) | 4455303811 ±1 (±~0.00000002%)
    `resolve_crate` | 942154987.5 ±53068423.5 (±~5.6%) | 3951197709 ±39 (±~0.000001%) | 3951196865 ±0 (±~0%)

    * [**"Micro" noise (individual sampling intervals)**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#“Micro”-noise-(individual-sampling-intervals))

* [**Caveats**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Caveats)
    * [**Disabling ASLR**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Disabling-ASLR)
    * [**Non-deterministic proc macros**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Non-deterministic-proc-macros)
    * [**Subtracting IRQs**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Subtracting-IRQs)
    * [**Lack of support for multiple threads**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Lack-of-support-for-multiple-threads)

* [**Challenges**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Challenges)
    * [**How do we even read hardware performance counters?**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#How-do-we-even-read-hardware-performance-counters)
    * [**ASLR: it's free entropy**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#ASLR-it’s-free-entropy)
    * [**The serializing instruction**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#The-serializing-instruction)
    * [**Getting constantly interrupted**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Getting-constantly-interrupted)
    * [**AMD patented time-travel and dubbed it `SpecLockMap`<br><sup>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;or: "how we accidentally unlocked `rr` on AMD Zen"</sup>**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#AMD-patented-time-travel-and-dubbed-it-SpecLockMapnbspnbspnbspnbspnbspnbspnbspnbspor-“how-we-accidentally-unlocked-rr-on-AMD-Zen”)
    * [**`jemalloc`: purging will commence in ten seconds**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#jemalloc-purging-will-commence-in-ten-seconds)
    * [**Rebasing *shouldn't* affect the results, right?**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Rebasing-*shouldn’t*-affect-the-results,-right)
    * [**Epilogue: Zen's undocumented 420 counter**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Epilogue-Zen’s-undocumented-420-counter)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants