Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf_hooks.PerformanceObserver is very slow #464

Closed
rbuckton opened this issue Jan 27, 2021 · 8 comments
Closed

perf_hooks.PerformanceObserver is very slow #464

rbuckton opened this issue Jan 27, 2021 · 8 comments
Labels

Comments

@rbuckton
Copy link

rbuckton commented Jan 27, 2021

Back in October of 2020, TypeScript switched (1, 2) from using our own performance measurement API to using perf_hooks.performance and perf_hooks.PerformanceObserver. Since then we have had some users reporting that building with TypeScript using our --diagnostics or --extendedDiagnostics flags (which enables on our performance measurement functionality) has caused build time to regress by an additional 20% in some cases prior to this change.

We had two goals with our original change:

  • Reduce the size of the TS compiler by leveraging built-in functionality in NodeJS/browser
  • Leverage functionality like --cpu-prof with the hopes that we could generate cpu profiles with user timings information.

However, we may be forced to revert to our previous custom implementation due to the significant overhead incurred using PerformanceObserver.

Our expectation would be that using performance measurement APIs shouldn't significantly impact the performance of observed code (though we are aware that its impossible for performance measurement to be completely free).

On a side note, I spoke with @devsnek offline and they suggested we try our scenario using a recent NodeJS 15 build and the --turbo-fast-api-calls flag. The table below reflects the results of running the compiler with and without this flag on different NodeJS versions when compiling ant-design/antd:

TypeScript NodeJS Turbo 1 Emit Time 2 Total Time 3
4.0.5 4 v12.13.0 No 0.760s 9.569s
4.0.5 v14.15.4 No 0.810s 9.441s
4.0.5 v14.15.4 Yes 0.780s 9.477s
4.0.5 v16.0.0-nightly2021012613ac5fbc57 No 0.790s 9.568s
4.0.5 v16.0.0-nightly2021012613ac5fbc57 Yes 0.780s 9.649s
4.1.3 5 v12.13.0 No 2.730s 11.252s
4.1.3 v14.15.4 No 3.090s 11.488s
4.1.3 v14.15.4 Yes 3.090s 11.532s
4.1.3 v16.0.0-nightly2021012613ac5fbc57 No 2.660s 11.184s
4.1.3 v16.0.0-nightly2021012613ac5fbc57 Yes 2.680s 11.226s
  1. Indicates whether --turbo-fast-api-calls was passed to NodeJS
  2. Emit Time is a subset of Total Time and shows where we're seeing the largest impact
  3. Median of 9 runs on an Ubuntu Server machine running tsc with --extendedDiagnostics
  4. typescript@4.0.5 used our custom performance measurement API
  5. typescript@4.1.3 uses perf_hooks.performance and perf_hooks.PerformanceObserver

I'd like to point out several take-aways from the table above:

  • Using perf_hooks.PerformanceObserver is generally about 17.5% slower in the above benchmarks than our prior custom measurement API.
  • Using --turbo-fast-api-calls actually makes things worse rather than better.

In the event I'm doing something wrong with our use of PerformanceObserver, you can find the source for our performance wrapper (which essentially just forwards calls to mark and measure and uses a PerformanceObserver to capture information) here: https://github.com/microsoft/TypeScript/blob/cdd11e96ad3dd039da76c1506e35bc7e74dd57f1/src/compiler/performance.ts

@rbuckton rbuckton changed the title NodeJS perf_hooks.PerformanceObserver is very slow perf_hooks.PerformanceObserver is very slow Jan 27, 2021
@rbuckton
Copy link
Author

rbuckton commented Jan 27, 2021

Also note, when running TypeScript without --extendedDiagnostics for the cases above, the total compile time numbers are fairly stable across each scenario:

TypeScript NodeJS Turbo 1 Total Time 2
4.0.5 3 v12.13.0 No 9.416s
4.0.5 v14.15.4 Yes 9.339s
4.0.5 v14.15.4 No 9.336s
4.0.5 v16.0.0-nightly2021012613ac5fbc57 Yes 9.514s
4.0.5 v16.0.0-nightly2021012613ac5fbc57 No 9.456s
4.1.3 4 v12.13.0 No 8.968s
4.1.3 v14.15.4 Yes 8.915s
4.1.3 v14.15.4 No 8.870s
4.1.3 v16.0.0-nightly2021012613ac5fbc57 Yes 9.008s
4.1.3 v16.0.0-nightly2021012613ac5fbc57 No 9.062s
  1. Indicates whether --turbo-fast-api-calls was passed to NodeJS
  2. Median of 9 runs on an Ubuntu Server machine without --extendedDiagnostics
  3. typescript@4.0.5 used our custom performance measurement API
  4. typescript@4.1.3 uses perf_hooks.performance and perf_hooks.PerformanceObserver

Another note is that the difference between using and not using --extendedDiagnostics in TS 4.0.5 (prior to our switch to PerformanceObserver) was only an increase in about ~150ms to ~400ms, rather than the ~2,000ms increase we've seen with PerformanceObserver.

@jasnell
Copy link
Member

jasnell commented Jan 27, 2021

Yep, the implementation could definitely use a performance overhaul. I've had it on my list for a while but haven't been able to prioritize it. There are a couple of key challenges on it, the most significant of which is just the cost of creating the PerformanceEntry instances and passing those back to the JavaScript side. I can certainly start to look into it but it would be helpful to have a bit more information on which PerformanceEntry types you are observing and what your target benchmarks would be.

@rbuckton
Copy link
Author

We use marks for two purposes:

  • As a start/end entry for use with performance.measure.
  • As a performance counter (i.e., this mark was encountered n times).

We use measures to calculate the total amount of time we spend in various operations, in aggregate. For example,
one of our measurements is I/O Read Time, in which we set a starting mark before we read from the file system, and an ending mark once the read has completed. From those two points we add a measure for the entire I/O Read operation. For a large project this could be hundreds of files. In the end, we present to the user a list of diagnostics about their compilation. For a measurement like I/O Read Time, this means aggregating the durations of every related measure.

There were a few things we've noticed when running microbenchmarks for performance.mark while investigating this:

  • About 10% of the time is spent computing an isolate-specific marker name
  • About 65% of the time is spent in v8::Object::DefineOwnProperty

@amcasey may be able to provide additional details, as he's been the one investigating the performance impact.

@jasnell
Copy link
Member

jasnell commented Jan 28, 2021

Fantastic. That makes sense. I've already got some ideas on improving things.

@jasnell
Copy link
Member

jasnell commented Jan 30, 2021

Ok, so nodejs/node#37136 has a complete rework of the user timing implementation that improves performance significantly. It is, however, semver-major. Assuming we're able to get it landed in the main repo I'll see if there's a way to backport a subset of the changes in a semver-minor PR.

jasnell added a commit to nodejs/node that referenced this issue Feb 22, 2021
* Update the user timing implementation to conform to
  User Timing Level 3.
* Reimplement user timing and timerify with pure JavaScript
  implementations
* Simplify the C++ implementation for gc and http2 perf
* Runtime deprecate additional perf entry properties
  in favor of the standard detail argument
* Disable the `buffered` option on PerformanceObserver,
  all entries are queued and dispatched on setImmediate.
  Only entries with active observers are buffered.
* This does remove the user timing and timerify
  trace events. Because the trace_events are still
  considered experimental, those can be removed without
  a deprecation cycle. They are removed to improve
  performance and reduce complexity.

Old: `perf_hooks/usertiming.js n=100000: 92,378.01249733355`
New: perf_hooks/usertiming.js n=100000: 270,393.5280638482`

PR-URL: #37136
Refs: nodejs/diagnostics#464
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
@github-actions
Copy link

github-actions bot commented May 1, 2021

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

@rbuckton
Copy link
Author

I neglected to look into the solution in nodejs/node#37136. Unfortunately, it depends on setImmediate, which means that while the solution may improve performance, it means that PerformanceObserver is still completely unusable for us since the TypeScript command line compiler (tsc.js) runs to completion synchronously, so we will never be able to observe the buffered events.

@jasnell
Copy link
Member

jasnell commented Jun 16, 2021

Yeah I've been thinking about options there. I prefer deferring the observer by default but I'm thinking about adding an option to force it sync or have it use the microtaskqueue instead.

legendecas added a commit to nodejs/node that referenced this issue Jul 25, 2021
All API introduced in this PR are compliant with web
[performance-timeline](https://w3c.github.io/performance-timeline)
spec. "performance-timeline" is listed as supported web spec in the doc
https://nodejs.org/docs/latest/api/perf_hooks.html#perf_hooks_performance_measurement_apis.

Changes summary:
1. Add new supported wpt test subsets: user-timing and
  performance-timeline.
2. Add support for `Performance.getEntries`,
  `Performance.getEntriesByName` and `Performance.getEntriesByType`
  to synchronously fetch buffered performance entries. This means
  the user should invoke `Performance.clearMarks` and
  `Performance.clearMeasures` to clear buffered entries to prevent from
  those entries been kept alive forever.
3. Add support (again after #37136)
  for `buffered` flags for `PerformanceObserver`.
3. Fixes `PerformanceMark` and `PerformanceMeasure` wpt compliance
  issues.
4. Only user-created performance entries will be buffered globally. This
  behavior should be compliant with
  https://w3c.github.io/timing-entrytypes-registry/#registry.

With the new ability to fetch user-created performance entries
synchronously, the issues raised in
nodejs/diagnostics#464 (comment)
could also be fixed.

PR-URL: #39297
Reviewed-By: James M Snell <jasnell@gmail.com>
targos pushed a commit to nodejs/node that referenced this issue Jul 26, 2021
All API introduced in this PR are compliant with web
[performance-timeline](https://w3c.github.io/performance-timeline)
spec. "performance-timeline" is listed as supported web spec in the doc
https://nodejs.org/docs/latest/api/perf_hooks.html#perf_hooks_performance_measurement_apis.

Changes summary:
1. Add new supported wpt test subsets: user-timing and
  performance-timeline.
2. Add support for `Performance.getEntries`,
  `Performance.getEntriesByName` and `Performance.getEntriesByType`
  to synchronously fetch buffered performance entries. This means
  the user should invoke `Performance.clearMarks` and
  `Performance.clearMeasures` to clear buffered entries to prevent from
  those entries been kept alive forever.
3. Add support (again after #37136)
  for `buffered` flags for `PerformanceObserver`.
3. Fixes `PerformanceMark` and `PerformanceMeasure` wpt compliance
  issues.
4. Only user-created performance entries will be buffered globally. This
  behavior should be compliant with
  https://w3c.github.io/timing-entrytypes-registry/#registry.

With the new ability to fetch user-created performance entries
synchronously, the issues raised in
nodejs/diagnostics#464 (comment)
could also be fixed.

PR-URL: #39297
Reviewed-By: James M Snell <jasnell@gmail.com>
richardlau pushed a commit to nodejs/node that referenced this issue Jul 29, 2021
All API introduced in this PR are compliant with web
[performance-timeline](https://w3c.github.io/performance-timeline)
spec. "performance-timeline" is listed as supported web spec in the doc
https://nodejs.org/docs/latest/api/perf_hooks.html#perf_hooks_performance_measurement_apis.

Changes summary:
1. Add new supported wpt test subsets: user-timing and
  performance-timeline.
2. Add support for `Performance.getEntries`,
  `Performance.getEntriesByName` and `Performance.getEntriesByType`
  to synchronously fetch buffered performance entries. This means
  the user should invoke `Performance.clearMarks` and
  `Performance.clearMeasures` to clear buffered entries to prevent from
  those entries been kept alive forever.
3. Add support (again after #37136)
  for `buffered` flags for `PerformanceObserver`.
3. Fixes `PerformanceMark` and `PerformanceMeasure` wpt compliance
  issues.
4. Only user-created performance entries will be buffered globally. This
  behavior should be compliant with
  https://w3c.github.io/timing-entrytypes-registry/#registry.

With the new ability to fetch user-created performance entries
synchronously, the issues raised in
nodejs/diagnostics#464 (comment)
could also be fixed.

PR-URL: #39297
Reviewed-By: James M Snell <jasnell@gmail.com>
BethGriggs pushed a commit to nodejs/node that referenced this issue Jul 29, 2021
All API introduced in this PR are compliant with web
[performance-timeline](https://w3c.github.io/performance-timeline)
spec. "performance-timeline" is listed as supported web spec in the doc
https://nodejs.org/docs/latest/api/perf_hooks.html#perf_hooks_performance_measurement_apis.

Changes summary:
1. Add new supported wpt test subsets: user-timing and
  performance-timeline.
2. Add support for `Performance.getEntries`,
  `Performance.getEntriesByName` and `Performance.getEntriesByType`
  to synchronously fetch buffered performance entries. This means
  the user should invoke `Performance.clearMarks` and
  `Performance.clearMeasures` to clear buffered entries to prevent from
  those entries been kept alive forever.
3. Add support (again after #37136)
  for `buffered` flags for `PerformanceObserver`.
3. Fixes `PerformanceMark` and `PerformanceMeasure` wpt compliance
  issues.
4. Only user-created performance entries will be buffered globally. This
  behavior should be compliant with
  https://w3c.github.io/timing-entrytypes-registry/#registry.

With the new ability to fetch user-created performance entries
synchronously, the issues raised in
nodejs/diagnostics#464 (comment)
could also be fixed.

PR-URL: #39297
Reviewed-By: James M Snell <jasnell@gmail.com>
ronag pushed a commit to nxtedition/node that referenced this issue Aug 1, 2021
All API introduced in this PR are compliant with web
[performance-timeline](https://w3c.github.io/performance-timeline)
spec. "performance-timeline" is listed as supported web spec in the doc
https://nodejs.org/docs/latest/api/perf_hooks.html#perf_hooks_performance_measurement_apis.

Changes summary:
1. Add new supported wpt test subsets: user-timing and
  performance-timeline.
2. Add support for `Performance.getEntries`,
  `Performance.getEntriesByName` and `Performance.getEntriesByType`
  to synchronously fetch buffered performance entries. This means
  the user should invoke `Performance.clearMarks` and
  `Performance.clearMeasures` to clear buffered entries to prevent from
  those entries been kept alive forever.
3. Add support (again after nodejs#37136)
  for `buffered` flags for `PerformanceObserver`.
3. Fixes `PerformanceMark` and `PerformanceMeasure` wpt compliance
  issues.
4. Only user-created performance entries will be buffered globally. This
  behavior should be compliant with
  https://w3c.github.io/timing-entrytypes-registry/#registry.

With the new ability to fetch user-created performance entries
synchronously, the issues raised in
nodejs/diagnostics#464 (comment)
could also be fixed.

PR-URL: nodejs#39297
Reviewed-By: James M Snell <jasnell@gmail.com>
danielleadams pushed a commit to nodejs/node that referenced this issue Aug 16, 2021
All API introduced in this PR are compliant with web
[performance-timeline](https://w3c.github.io/performance-timeline)
spec. "performance-timeline" is listed as supported web spec in the doc
https://nodejs.org/docs/latest/api/perf_hooks.html#perf_hooks_performance_measurement_apis.

Changes summary:
1. Add new supported wpt test subsets: user-timing and
  performance-timeline.
2. Add support for `Performance.getEntries`,
  `Performance.getEntriesByName` and `Performance.getEntriesByType`
  to synchronously fetch buffered performance entries. This means
  the user should invoke `Performance.clearMarks` and
  `Performance.clearMeasures` to clear buffered entries to prevent from
  those entries been kept alive forever.
3. Add support (again after #37136)
  for `buffered` flags for `PerformanceObserver`.
3. Fixes `PerformanceMark` and `PerformanceMeasure` wpt compliance
  issues.
4. Only user-created performance entries will be buffered globally. This
  behavior should be compliant with
  https://w3c.github.io/timing-entrytypes-registry/#registry.

With the new ability to fetch user-created performance entries
synchronously, the issues raised in
nodejs/diagnostics#464 (comment)
could also be fixed.

PR-URL: #39297
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants