defaultMetrics should be collected synchronously #180

brian-brazil · 2018-03-26T08:43:06Z

As noted in https://prometheus.io/docs/instrumenting/writing_exporters/#scheduling collection should be synchronous with a scrape, and not get updated in a background thread. The defaultMetrics should be called at a scrape, not with a setInterval.

SimenB · 2018-03-26T08:56:00Z

I didn't know about that, thanks for pointing it out!

Changing should be easy enough, we just have to either make .metrics() return a promise or add a separate method to update the registry whilst keeping metrics() synchronous. The update itself still has to be async since we read from /proc (sync IO is a big no-no in Node because of the single threaded model).

I have no preference to either solution, it'll be pretty breaking no matter what (as we expose methods to tweak the interval, which will have to be removed)

mirague · 2019-02-05T10:37:28Z

Could the synchronous approach not be an addition (backwards compatibility), with being able to disable the background polling?

siimon · 2019-02-06T22:05:35Z

Why would it be a breaking change? Sure the response will be different for each time you make a request but I find it hard to classify that as a breaking change.

If it is a breaking change I still think it should just be implemented and not get unnecessary technical debt.

sam-github · 2020-01-29T18:50:20Z

It looks pretty breaking to me. Typical use looks like:

  var prom = require('prom-client')
  var register = prom.register;
  prom.collectDefaultMetrics({timestamps: false});
  app.get('/metrics', (req, res) => {
    res.set('Content-Type', register.contentType);
    res.end(register.metrics());
  });

If .metrics() kicks off an async read of /proc/whatever, then what is it going to return? The state as-of the last time .metrics() was called? I guess that's possible, but not fabulous.

Some options I can think of, all of which have downsides:

Always return current known data from .metrics(), but kick off a recollect -- so data is always a little bit stale. Then again, with current interval based approach, metrics are always a bit stale anyhow ATM, depending on how the prometheus scrape interval ends up interacting with the internal scrape interval.
Add a new async API .collect((metrics) => res.end(metrics)) so that collection is async -- I like this API, but it does leave all current users with stale metrics, and it might also cause cruft in that there will be extra code to support both async and sync APIs
Other options?

(.collect could be called .metrics and be sync or async depending on whether it receives a cb)

I'm not sure how accepting of a semver-major you all are. Making .metrics() async and cleaning out the internals is best for maintainers, but it requires all users to update their code.

zbjornson · 2020-01-29T18:53:12Z

I'm in favor of #2 also. Rather only provide one API and have it be the Right Way. 🚀

I think this "pull" interface cascades all the way down. I haven't looked exactly, but I was envisioning that all of the metric types have some new method like collect(), which is called on every scrape.

sam-github · 2020-01-29T20:17:59Z

So, did a bit of poking around

The update itself still has to be async since we read from /proc (sync IO is a big no-no in Node because of the single threaded model).

That seems pretty reasonable on the face of it.... but prom-client calls https://nodejs.org/api/process.html#process_process_memoryusage on non-linux, which calls https://github.com/nodejs/node/blob/d65e6a50176e4847738a77c3579fe52c2735fd8b/deps/uv/src/unix/linux-core.c#L506-L525 ... which is sync.

Maybe a one-shot sync call to read /proc isn't such a big deal? Its basically a memory read, the data is coming from kernel memory, its not going to trigger pulling data in from a spinning disk, or craziness like that.

SimenB · 2020-01-29T20:22:48Z

@sam-github you have way more knowledge about node core than I, so if you think it's fine let's go for it!

* chore: remove support for node < 10.x * chore: simplify mem usage try/catch * fix: make example/server use PORT * chore: add async notes to loop lag metric * fix: sync collection of linux vm metrics This allows the metrics to be collected at scrape time, rather than on an interval timer. * fix: remove timestamp support * fix: sync collection of linux max fd limits * fix: sync collection of linux fd count * fix: always set start time * fix: always set version labels * fix: collect metrics on scrape, not timeout Only the event loop "lag" is still async, see the in-src notes. Fixes: #180

SimenB added bug help wanted labels Mar 26, 2018

SimenB mentioned this issue Mar 28, 2018

ensure that only registered cluster workers are asked to report metrics #182

Open

ssube mentioned this issue Jan 4, 2019

nodejs_version_info empty after registry reset #238

Closed

zbjornson mentioned this issue Dec 16, 2019

Why are default metrics collected on an interval? #307

Closed

sam-github mentioned this issue Feb 1, 2020

fix: collect metrics on scrape, not timeout #329

Merged

SimenB closed this as completed in #329 Feb 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

defaultMetrics should be collected synchronously #180

defaultMetrics should be collected synchronously #180

brian-brazil commented Mar 26, 2018

SimenB commented Mar 26, 2018 •

edited

Loading

mirague commented Feb 5, 2019

siimon commented Feb 6, 2019

sam-github commented Jan 29, 2020

zbjornson commented Jan 29, 2020 •

edited

Loading

sam-github commented Jan 29, 2020

SimenB commented Jan 29, 2020

defaultMetrics should be collected synchronously #180

defaultMetrics should be collected synchronously #180

Comments

brian-brazil commented Mar 26, 2018

SimenB commented Mar 26, 2018 • edited Loading

mirague commented Feb 5, 2019

siimon commented Feb 6, 2019

sam-github commented Jan 29, 2020

zbjornson commented Jan 29, 2020 • edited Loading

sam-github commented Jan 29, 2020

SimenB commented Jan 29, 2020

SimenB commented Mar 26, 2018 •

edited

Loading

zbjornson commented Jan 29, 2020 •

edited

Loading