Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

defaultMetrics should be collected synchronously #180

Closed
brian-brazil opened this issue Mar 26, 2018 · 7 comments · Fixed by #329
Closed

defaultMetrics should be collected synchronously #180

brian-brazil opened this issue Mar 26, 2018 · 7 comments · Fixed by #329

Comments

@brian-brazil
Copy link

As noted in https://prometheus.io/docs/instrumenting/writing_exporters/#scheduling collection should be synchronous with a scrape, and not get updated in a background thread. The defaultMetrics should be called at a scrape, not with a setInterval.

@SimenB
Copy link
Collaborator

SimenB commented Mar 26, 2018

I didn't know about that, thanks for pointing it out!

Changing should be easy enough, we just have to either make .metrics() return a promise or add a separate method to update the registry whilst keeping metrics() synchronous. The update itself still has to be async since we read from /proc (sync IO is a big no-no in Node because of the single threaded model).

I have no preference to either solution, it'll be pretty breaking no matter what (as we expose methods to tweak the interval, which will have to be removed)

@mirague
Copy link

mirague commented Feb 5, 2019

Could the synchronous approach not be an addition (backwards compatibility), with being able to disable the background polling?

@siimon
Copy link
Owner

siimon commented Feb 6, 2019

Why would it be a breaking change? Sure the response will be different for each time you make a request but I find it hard to classify that as a breaking change.

If it is a breaking change I still think it should just be implemented and not get unnecessary technical debt.

@sam-github
Copy link
Contributor

It looks pretty breaking to me. Typical use looks like:

  var prom = require('prom-client')
  var register = prom.register;
  prom.collectDefaultMetrics({timestamps: false});
  app.get('/metrics', (req, res) => {
    res.set('Content-Type', register.contentType);
    res.end(register.metrics());
  });

If .metrics() kicks off an async read of /proc/whatever, then what is it going to return? The state as-of the last time .metrics() was called? I guess that's possible, but not fabulous.

Some options I can think of, all of which have downsides:

  1. Always return current known data from .metrics(), but kick off a recollect -- so data is always a little bit stale. Then again, with current interval based approach, metrics are always a bit stale anyhow ATM, depending on how the prometheus scrape interval ends up interacting with the internal scrape interval.
  2. Add a new async API .collect((metrics) => res.end(metrics)) so that collection is async -- I like this API, but it does leave all current users with stale metrics, and it might also cause cruft in that there will be extra code to support both async and sync APIs
  3. Other options?

(.collect could be called .metrics and be sync or async depending on whether it receives a cb)

I'm not sure how accepting of a semver-major you all are. Making .metrics() async and cleaning out the internals is best for maintainers, but it requires all users to update their code.

@zbjornson
Copy link
Collaborator

zbjornson commented Jan 29, 2020

I'm in favor of #2 also. Rather only provide one API and have it be the Right Way. 🚀

I think this "pull" interface cascades all the way down. I haven't looked exactly, but I was envisioning that all of the metric types have some new method like collect(), which is called on every scrape.

@sam-github
Copy link
Contributor

So, did a bit of poking around

The update itself still has to be async since we read from /proc (sync IO is a big no-no in Node because of the single threaded model).

That seems pretty reasonable on the face of it.... but prom-client calls https://nodejs.org/api/process.html#process_process_memoryusage on non-linux, which calls https://github.com/nodejs/node/blob/d65e6a50176e4847738a77c3579fe52c2735fd8b/deps/uv/src/unix/linux-core.c#L506-L525 ... which is sync.

Maybe a one-shot sync call to read /proc isn't such a big deal? Its basically a memory read, the data is coming from kernel memory, its not going to trigger pulling data in from a spinning disk, or craziness like that.

@SimenB
Copy link
Collaborator

SimenB commented Jan 29, 2020

@sam-github you have way more knowledge about node core than I, so if you think it's fine let's go for it!

SimenB pushed a commit that referenced this issue Feb 12, 2020
* chore: remove support for node < 10.x

* chore: simplify mem usage try/catch

* fix: make example/server use PORT

* chore: add async notes to loop lag metric

* fix: sync collection of linux vm metrics

This allows the metrics to be collected at scrape time, rather than on
an interval timer.

* fix: remove timestamp support

* fix: sync collection of linux max fd limits

* fix: sync collection of linux fd count

* fix: always set start time

* fix: always set version labels

* fix: collect metrics on scrape, not timeout

Only the event loop "lag" is still async, see the in-src notes.

Fixes: #180
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants