perf_hooks.PerformanceObserver is very slow #464

rbuckton · 2021-01-27T21:17:12Z

Back in October of 2020, TypeScript switched (1, 2) from using our own performance measurement API to using perf_hooks.performance and perf_hooks.PerformanceObserver. Since then we have had some users reporting that building with TypeScript using our --diagnostics or --extendedDiagnostics flags (which enables on our performance measurement functionality) has caused build time to regress by an additional 20% in some cases prior to this change.

We had two goals with our original change:

Reduce the size of the TS compiler by leveraging built-in functionality in NodeJS/browser
Leverage functionality like --cpu-prof with the hopes that we could generate cpu profiles with user timings information.

However, we may be forced to revert to our previous custom implementation due to the significant overhead incurred using PerformanceObserver.

Our expectation would be that using performance measurement APIs shouldn't significantly impact the performance of observed code (though we are aware that its impossible for performance measurement to be completely free).

On a side note, I spoke with @devsnek offline and they suggested we try our scenario using a recent NodeJS 15 build and the --turbo-fast-api-calls flag. The table below reflects the results of running the compiler with and without this flag on different NodeJS versions when compiling ant-design/antd:

TypeScript	NodeJS	Turbo ¹	Emit Time ²	Total Time ³
4.0.5 ⁴	v12.13.0	No	0.760s	9.569s
4.0.5	v14.15.4	No	0.810s	9.441s
4.0.5	v14.15.4	Yes	0.780s	9.477s
4.0.5	v16.0.0-nightly2021012613ac5fbc57	No	0.790s	9.568s
4.0.5	v16.0.0-nightly2021012613ac5fbc57	Yes	0.780s	9.649s
4.1.3 ⁵	v12.13.0	No	2.730s	11.252s
4.1.3	v14.15.4	No	3.090s	11.488s
4.1.3	v14.15.4	Yes	3.090s	11.532s
4.1.3	v16.0.0-nightly2021012613ac5fbc57	No	2.660s	11.184s
4.1.3	v16.0.0-nightly2021012613ac5fbc57	Yes	2.680s	11.226s

Indicates whether --turbo-fast-api-calls was passed to NodeJS
Emit Time is a subset of Total Time and shows where we're seeing the largest impact
Median of 9 runs on an Ubuntu Server machine running tsc with --extendedDiagnostics
typescript@4.0.5 used our custom performance measurement API
typescript@4.1.3 uses perf_hooks.performance and perf_hooks.PerformanceObserver

I'd like to point out several take-aways from the table above:

Using perf_hooks.PerformanceObserver is generally about 17.5% slower in the above benchmarks than our prior custom measurement API.
Using --turbo-fast-api-calls actually makes things worse rather than better.

In the event I'm doing something wrong with our use of PerformanceObserver, you can find the source for our performance wrapper (which essentially just forwards calls to mark and measure and uses a PerformanceObserver to capture information) here: https://github.com/microsoft/TypeScript/blob/cdd11e96ad3dd039da76c1506e35bc7e74dd57f1/src/compiler/performance.ts

The text was updated successfully, but these errors were encountered:

rbuckton · 2021-01-27T21:24:29Z

Also note, when running TypeScript without --extendedDiagnostics for the cases above, the total compile time numbers are fairly stable across each scenario:

TypeScript	NodeJS	Turbo ¹	Total Time ²
4.0.5 ³	v12.13.0	No	9.416s
4.0.5	v14.15.4	Yes	9.339s
4.0.5	v14.15.4	No	9.336s
4.0.5	v16.0.0-nightly2021012613ac5fbc57	Yes	9.514s
4.0.5	v16.0.0-nightly2021012613ac5fbc57	No	9.456s
4.1.3 ⁴	v12.13.0	No	8.968s
4.1.3	v14.15.4	Yes	8.915s
4.1.3	v14.15.4	No	8.870s
4.1.3	v16.0.0-nightly2021012613ac5fbc57	Yes	9.008s
4.1.3	v16.0.0-nightly2021012613ac5fbc57	No	9.062s

Indicates whether --turbo-fast-api-calls was passed to NodeJS
Median of 9 runs on an Ubuntu Server machine without --extendedDiagnostics
typescript@4.0.5 used our custom performance measurement API
typescript@4.1.3 uses perf_hooks.performance and perf_hooks.PerformanceObserver

Another note is that the difference between using and not using --extendedDiagnostics in TS 4.0.5 (prior to our switch to PerformanceObserver) was only an increase in about ~150ms to ~400ms, rather than the ~2,000ms increase we've seen with PerformanceObserver.

jasnell · 2021-01-27T22:08:23Z

Yep, the implementation could definitely use a performance overhaul. I've had it on my list for a while but haven't been able to prioritize it. There are a couple of key challenges on it, the most significant of which is just the cost of creating the PerformanceEntry instances and passing those back to the JavaScript side. I can certainly start to look into it but it would be helpful to have a bit more information on which PerformanceEntry types you are observing and what your target benchmarks would be.

rbuckton · 2021-01-28T01:45:21Z

We use marks for two purposes:

As a start/end entry for use with performance.measure.
As a performance counter (i.e., this mark was encountered n times).

We use measures to calculate the total amount of time we spend in various operations, in aggregate. For example,
one of our measurements is I/O Read Time, in which we set a starting mark before we read from the file system, and an ending mark once the read has completed. From those two points we add a measure for the entire I/O Read operation. For a large project this could be hundreds of files. In the end, we present to the user a list of diagnostics about their compilation. For a measurement like I/O Read Time, this means aggregating the durations of every related measure.

There were a few things we've noticed when running microbenchmarks for performance.mark while investigating this:

About 10% of the time is spent computing an isolate-specific marker name
About 65% of the time is spent in v8::Object::DefineOwnProperty

@amcasey may be able to provide additional details, as he's been the one investigating the performance impact.

jasnell · 2021-01-28T03:26:28Z

Fantastic. That makes sense. I've already got some ideas on improving things.

jasnell · 2021-01-30T04:20:53Z

Ok, so nodejs/node#37136 has a complete rework of the user timing implementation that improves performance significantly. It is, however, semver-major. Assuming we're able to get it landed in the main repo I'll see if there's a way to backport a subset of the changes in a semver-minor PR.

* Update the user timing implementation to conform to User Timing Level 3. * Reimplement user timing and timerify with pure JavaScript implementations * Simplify the C++ implementation for gc and http2 perf * Runtime deprecate additional perf entry properties in favor of the standard detail argument * Disable the `buffered` option on PerformanceObserver, all entries are queued and dispatched on setImmediate. Only entries with active observers are buffered. * This does remove the user timing and timerify trace events. Because the trace_events are still considered experimental, those can be removed without a deprecation cycle. They are removed to improve performance and reduce complexity. Old: `perf_hooks/usertiming.js n=100000: 92,378.01249733355` New: perf_hooks/usertiming.js n=100000: 270,393.5280638482` PR-URL: #37136 Refs: nodejs/diagnostics#464 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Michaël Zasso <targos@protonmail.com>

github-actions · 2021-05-01T01:13:14Z

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

rbuckton · 2021-06-16T00:27:40Z

I neglected to look into the solution in nodejs/node#37136. Unfortunately, it depends on setImmediate, which means that while the solution may improve performance, it means that PerformanceObserver is still completely unusable for us since the TypeScript command line compiler (tsc.js) runs to completion synchronously, so we will never be able to observe the buffered events.

jasnell · 2021-06-16T01:58:34Z

Yeah I've been thinking about options there. I prefer deferring the observer by default but I'm thinking about adding an option to force it sync or have it use the microtaskqueue instead.

All API introduced in this PR are compliant with web [performance-timeline](https://w3c.github.io/performance-timeline) spec. "performance-timeline" is listed as supported web spec in the doc https://nodejs.org/docs/latest/api/perf_hooks.html#perf_hooks_performance_measurement_apis. Changes summary: 1. Add new supported wpt test subsets: user-timing and performance-timeline. 2. Add support for `Performance.getEntries`, `Performance.getEntriesByName` and `Performance.getEntriesByType` to synchronously fetch buffered performance entries. This means the user should invoke `Performance.clearMarks` and `Performance.clearMeasures` to clear buffered entries to prevent from those entries been kept alive forever. 3. Add support (again after #37136) for `buffered` flags for `PerformanceObserver`. 3. Fixes `PerformanceMark` and `PerformanceMeasure` wpt compliance issues. 4. Only user-created performance entries will be buffered globally. This behavior should be compliant with https://w3c.github.io/timing-entrytypes-registry/#registry. With the new ability to fetch user-created performance entries synchronously, the issues raised in nodejs/diagnostics#464 (comment) could also be fixed. PR-URL: #39297 Reviewed-By: James M Snell <jasnell@gmail.com>

All API introduced in this PR are compliant with web [performance-timeline](https://w3c.github.io/performance-timeline) spec. "performance-timeline" is listed as supported web spec in the doc https://nodejs.org/docs/latest/api/perf_hooks.html#perf_hooks_performance_measurement_apis. Changes summary: 1. Add new supported wpt test subsets: user-timing and performance-timeline. 2. Add support for `Performance.getEntries`, `Performance.getEntriesByName` and `Performance.getEntriesByType` to synchronously fetch buffered performance entries. This means the user should invoke `Performance.clearMarks` and `Performance.clearMeasures` to clear buffered entries to prevent from those entries been kept alive forever. 3. Add support (again after nodejs#37136) for `buffered` flags for `PerformanceObserver`. 3. Fixes `PerformanceMark` and `PerformanceMeasure` wpt compliance issues. 4. Only user-created performance entries will be buffered globally. This behavior should be compliant with https://w3c.github.io/timing-entrytypes-registry/#registry. With the new ability to fetch user-created performance entries synchronously, the issues raised in nodejs/diagnostics#464 (comment) could also be fixed. PR-URL: nodejs#39297 Reviewed-By: James M Snell <jasnell@gmail.com>

All API introduced in this PR are compliant with web [performance-timeline](https://w3c.github.io/performance-timeline) spec. "performance-timeline" is listed as supported web spec in the doc https://nodejs.org/docs/latest/api/perf_hooks.html#perf_hooks_performance_measurement_apis. Changes summary: 1. Add new supported wpt test subsets: user-timing and performance-timeline. 2. Add support for `Performance.getEntries`, `Performance.getEntriesByName` and `Performance.getEntriesByType` to synchronously fetch buffered performance entries. This means the user should invoke `Performance.clearMarks` and `Performance.clearMeasures` to clear buffered entries to prevent from those entries been kept alive forever. 3. Add support (again after #37136) for `buffered` flags for `PerformanceObserver`. 3. Fixes `PerformanceMark` and `PerformanceMeasure` wpt compliance issues. 4. Only user-created performance entries will be buffered globally. This behavior should be compliant with https://w3c.github.io/timing-entrytypes-registry/#registry. With the new ability to fetch user-created performance entries synchronously, the issues raised in nodejs/diagnostics#464 (comment) could also be fixed. PR-URL: #39297 Reviewed-By: James M Snell <jasnell@gmail.com>

rbuckton changed the title ~~NodeJS perf_hooks.PerformanceObserver is very slow~~ perf_hooks.PerformanceObserver is very slow Jan 27, 2021

jasnell mentioned this issue Jan 30, 2021

perf_hooks: complete overhaul of the implementation nodejs/node#37136

Closed

rbuckton mentioned this issue Feb 1, 2021

Switched 'ts.performance' to a mixed mode only uses native performance APIs when necessary microsoft/TypeScript#42586

Merged

github-actions bot added the stale label May 1, 2021

github-actions bot closed this as completed May 15, 2021

legendecas mentioned this issue Jul 7, 2021

perf_hooks: add ability to fetch performance entries synchronously nodejs/node#39297

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf_hooks.PerformanceObserver is very slow #464

perf_hooks.PerformanceObserver is very slow #464

rbuckton commented Jan 27, 2021 •

edited

Loading

rbuckton commented Jan 27, 2021 •

edited

Loading

jasnell commented Jan 27, 2021

rbuckton commented Jan 28, 2021

jasnell commented Jan 28, 2021

jasnell commented Jan 30, 2021

github-actions bot commented May 1, 2021

rbuckton commented Jun 16, 2021

jasnell commented Jun 16, 2021

perf_hooks.PerformanceObserver is very slow #464

perf_hooks.PerformanceObserver is very slow #464

Comments

rbuckton commented Jan 27, 2021 • edited Loading

rbuckton commented Jan 27, 2021 • edited Loading

jasnell commented Jan 27, 2021

rbuckton commented Jan 28, 2021

jasnell commented Jan 28, 2021

jasnell commented Jan 30, 2021

github-actions bot commented May 1, 2021

rbuckton commented Jun 16, 2021

jasnell commented Jun 16, 2021

rbuckton commented Jan 27, 2021 •

edited

Loading

rbuckton commented Jan 27, 2021 •

edited

Loading