Skip to content

Commit

Permalink
fix: collect metrics on scrape, not timeout (#329)
Browse files Browse the repository at this point in the history
* chore: remove support for node < 10.x

* chore: simplify mem usage try/catch

* fix: make example/server use PORT

* chore: add async notes to loop lag metric

* fix: sync collection of linux vm metrics

This allows the metrics to be collected at scrape time, rather than on
an interval timer.

* fix: remove timestamp support

* fix: sync collection of linux max fd limits

* fix: sync collection of linux fd count

* fix: always set start time

* fix: always set version labels

* fix: collect metrics on scrape, not timeout

Only the event loop "lag" is still async, see the in-src notes.

Fixes: #180
  • Loading branch information
sam-github authored Feb 12, 2020
1 parent c63689b commit 5aca6a9
Show file tree
Hide file tree
Showing 20 changed files with 154 additions and 208 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ project adheres to [Semantic Versioning](http://semver.org/).
- Dropped support for end-of-life Node.js versions 6.x and 8.x
- Dropped the previously deprecated support for positional parameters in
constructors, only the config object forms remain.
- Default metrics are collected on scrape of metrics endpoint, not on an
interval. The `timeout` option to `collectDefaultMetrics(conf)` is no longer
supported or needed, and the function no longer returns a `Timeout` object.

### Changed

Expand Down
24 changes: 5 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,24 +51,13 @@ In addition, some Node-specific metrics are included, such as event loop lag,
active handles, GC and Node.js version. See what metrics there are in
[lib/metrics](lib/metrics).

`collectDefaultMetrics` takes 1 options object with following entries:
`collectDefaultMetrics` optionally accepts a config object with following entries:

- `timeout` for how often the probe should be fired. Default: 10 seconds.
- `prefix` an optional prefix for metric names.
- `registry` to which metrics should be registered.
- `prefix` an optional prefix for metric names. Default: no prefix.
- `registry` to which metrics should be registered. Default: the global default registry.
- `gcDurationBuckets` with custom buckets for GC duration histogram. Default buckets of GC duration histogram are `[0.001, 0.01, 0.1, 1, 2, 5]` (in seconds).
- `eventLoopMonitoringPrecision` with sampling rate in milliseconds. Must be greater than zero. Default: 10.

By default probes are launched every 10 seconds, but this can be modified like this:

```js
const client = require('prom-client');

const collectDefaultMetrics = client.collectDefaultMetrics;

// Probe every 5th second.
collectDefaultMetrics({ timeout: 5000 });
```

To register metrics to another registry, pass it in as `register`:

Expand All @@ -78,7 +67,6 @@ const client = require('prom-client');
const collectDefaultMetrics = client.collectDefaultMetrics;
const Registry = client.Registry;
const register = new Registry();

collectDefaultMetrics({ register });
```

Expand All @@ -96,11 +84,9 @@ To prefix metric names with your own arbitrary string, pass in a `prefix`:

```js
const client = require('prom-client');

const collectDefaultMetrics = client.collectDefaultMetrics;

// Probe every 5th second.
collectDefaultMetrics({ prefix: 'my_application_' });
const prefix = 'my_application_';
collectDefaultMetrics({ prefix });
```

To disable metric timestamps set `timestamps` to `false` (You can find the list of metrics that support this feature in `test/defaultMetricsTest.js`):
Expand Down
9 changes: 6 additions & 3 deletions example/server.js
Original file line number Diff line number Diff line change
Expand Up @@ -79,11 +79,14 @@ server.get('/metrics/counter', (req, res) => {
res.end(register.getSingleMetricAsString('test_counter'));
});

//Enable collection of default metrics
// Enable collection of default metrics
require('../').collectDefaultMetrics({
timeout: 10000,
gcDurationBuckets: [0.001, 0.01, 0.1, 1, 2, 5] // These are the default buckets.
});

console.log('Server listening to 3000, metrics exposed on /metrics endpoint');
server.listen(3000);
const port = process.env.PORT || 3000;
console.log(
`Server listening to ${port}, metrics exposed on /metrics endpoint`
);
server.listen(port);
4 changes: 1 addition & 3 deletions index.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -590,7 +590,6 @@ export function exponentialBuckets(
): number[];

export interface DefaultMetricsCollectorConfiguration {
timeout?: number;
timestamps?: boolean;
register?: Registry;
prefix?: string;
Expand All @@ -601,11 +600,10 @@ export interface DefaultMetricsCollectorConfiguration {
/**
* Configure default metrics
* @param config Configuration object for default metrics collector
* @return The setInterval number
*/
export function collectDefaultMetrics(
config?: DefaultMetricsCollectorConfiguration
): ReturnType<typeof setInterval>;
): void;

export interface defaultMetrics {
/**
Expand Down
53 changes: 27 additions & 26 deletions lib/defaultMetrics.js
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,7 @@ const metrics = {
};
const metricsList = Object.keys(metrics);

let existingInterval = null;
// This is used to ensure the program throws on duplicate metrics during first run
// We might want to consider not supporting running the default metrics function more than once
let init = true;

module.exports = function startDefaultMetrics(config) {
module.exports = function collectDefaultMetrics(config) {
if (config !== null && config !== undefined && !isObject(config)) {
throw new Error('config must be null, undefined, or an object');
}
Expand All @@ -52,33 +47,39 @@ module.exports = function startDefaultMetrics(config) {
config
);

if (existingInterval !== null) {
clearInterval(existingInterval);
}
const registry = config.register || globalRegistry;
const last = registry
.collectors()
.find(collector => collector._source === metrics);

const initialisedMetrics = metricsList.map(metric => {
const defaultMetric = metrics[metric];
if (!init) {
defaultMetric.metricNames.map(
globalRegistry.removeSingleMetric,
globalRegistry
);
}
if (last) {
throw new Error(
'Cannot add the default metrics twice to the same registry'
);
}

return defaultMetric(config.register, config);
const scrapers = metricsList.map(key => {
const metric = metrics[key];
return metric(config.register, config);
});

function updateAllMetrics() {
initialisedMetrics.forEach(metric => metric.call());
// Ideally the library would be based around a concept of collectors and
// async callbacks, but in the short-term, trigger scraping of the
// current metric value synchronously.
// - // https://prometheus.io/docs/instrumenting/writing_clientlibs/#overall-structure
function defaultMetricCollector() {
scrapers.forEach(scraper => scraper());
}

updateAllMetrics();

existingInterval = setInterval(updateAllMetrics, config.timeout).unref();

init = false;
// defaultMetricCollector has to be dynamic, because the scrapers are in
// its closure, but we still want to identify a default collector, so
// tag it with a value known only to this module (the const metric array
// value) so we can find it later.
defaultMetricCollector._source = metrics;
registry.registerCollector(defaultMetricCollector);

return existingInterval;
// Because the tests expect an immediate collection.
defaultMetricCollector();
};

module.exports.metricsList = metricsList;
5 changes: 5 additions & 0 deletions lib/metrics/eventLoopLag.js
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
'use strict';

const Gauge = require('../gauge');

// Check if perf_hooks module is available
Expand All @@ -10,7 +11,11 @@ try {
// node version is too old
}

// Reported always, but because legacy lag_seconds is collected async, the value
// will always be stale by one scrape interval.
const NODEJS_EVENTLOOP_LAG = 'nodejs_eventloop_lag_seconds';

// Reported only when perf_hooks is available.
const NODEJS_EVENTLOOP_LAG_MIN = 'nodejs_eventloop_lag_min_seconds';
const NODEJS_EVENTLOOP_LAG_MAX = 'nodejs_eventloop_lag_max_seconds';
const NODEJS_EVENTLOOP_LAG_MEAN = 'nodejs_eventloop_lag_mean_seconds';
Expand Down
21 changes: 8 additions & 13 deletions lib/metrics/heapSizeAndUsed.js
Original file line number Diff line number Diff line change
Expand Up @@ -17,27 +17,22 @@ module.exports = (registry, config = {}) => {

const heapSizeTotal = new Gauge({
name: namePrefix + NODEJS_HEAP_SIZE_TOTAL,
help: 'Process heap size from node.js in bytes.',
help: 'Process heap size from Node.js in bytes.',
registers
});
const heapSizeUsed = new Gauge({
name: namePrefix + NODEJS_HEAP_SIZE_USED,
help: 'Process heap size used from node.js in bytes.',
help: 'Process heap size used from Node.js in bytes.',
registers
});
const externalMemUsed = new Gauge({
name: namePrefix + NODEJS_EXTERNAL_MEMORY,
help: 'Node.js external memory size in bytes.',
registers
});
let externalMemUsed;

const usage = safeMemoryUsage();
if (usage && usage.external) {
externalMemUsed = new Gauge({
name: namePrefix + NODEJS_EXTERNAL_MEMORY,
help: 'Nodejs external memory size in bytes.',
registers
});
}

return () => {
// process.memoryUsage() can throw EMFILE errors, see #67
// process.memoryUsage() can throw on some platforms, see #67
const memUsage = safeMemoryUsage();
if (memUsage) {
if (config.timestamps) {
Expand Down
19 changes: 2 additions & 17 deletions lib/metrics/heapSpacesSizeAndUsed.js
Original file line number Diff line number Diff line change
@@ -1,31 +1,16 @@
'use strict';

const Gauge = require('../gauge');
let v8;

try {
v8 = require('v8');
} catch (e) {
// node version is too old
// probably we can use v8-heap-space-statistics for >=node-4.0.0 and <node-6.0.0
}
const v8 = require('v8');

const METRICS = ['total', 'used', 'available'];

const NODEJS_HEAP_SIZE = {};

METRICS.forEach(metricType => {
NODEJS_HEAP_SIZE[metricType] = `nodejs_heap_space_size_${metricType}_bytes`;
});

module.exports = (registry, config = {}) => {
if (
typeof v8 === 'undefined' ||
typeof v8.getHeapSpaceStatistics !== 'function'
) {
return () => {};
}

const registers = registry ? [registry] : undefined;
const namePrefix = config.prefix ? config.prefix : '';

Expand All @@ -34,7 +19,7 @@ module.exports = (registry, config = {}) => {
METRICS.forEach(metricType => {
gauges[metricType] = new Gauge({
name: namePrefix + NODEJS_HEAP_SIZE[metricType],
help: `Process heap space size ${metricType} from node.js in bytes.`,
help: `Process heap space size ${metricType} from Node.js in bytes.`,
labelNames: ['space'],
registers
});
Expand Down
8 changes: 2 additions & 6 deletions lib/metrics/helpers/processMetricsHelpers.js
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,10 @@ function aggregateByObjectName(list) {
return data;
}

function updateMetrics(gauge, data, includeTimestamp) {
function updateMetrics(gauge, data) {
gauge.reset();
for (const key in data) {
if (includeTimestamp) {
gauge.set({ type: key }, data[key], Date.now());
} else {
gauge.set({ type: key }, data[key]);
}
gauge.set({ type: key }, data[key]);
}
}

Expand Down
7 changes: 2 additions & 5 deletions lib/metrics/helpers/safeMemoryUsage.js
Original file line number Diff line number Diff line change
@@ -1,14 +1,11 @@
'use strict';

function safeMemoryUsage() {
let memoryUsage;
try {
memoryUsage = process.memoryUsage();
return process.memoryUsage();
} catch (ex) {
// empty
return;
}

return memoryUsage;
}

module.exports = safeMemoryUsage;
25 changes: 15 additions & 10 deletions lib/metrics/osMemoryHeapLinux.js
Original file line number Diff line number Diff line change
Expand Up @@ -51,18 +51,23 @@ module.exports = (registry, config = {}) => {
registers
});

// Sync I/O is often problematic, but /proc isn't really I/O, it a
// virtual filesystem that maps directly to in-kernel data structures
// and never blocks.
//
// Node.js/libuv do this already for process.memoryUsage(), see:
// - https://github.com/libuv/libuv/blob/a629688008694ed8022269e66826d4d6ec688b83/src/unix/linux-core.c#L506-L523
return () => {
fs.readFile('/proc/self/status', 'utf8', (err, status) => {
if (err) {
return;
}
const now = Date.now();
const structuredOutput = structureOutput(status);
try {
const stat = fs.readFileSync('/proc/self/status', 'utf8');
const structuredOutput = structureOutput(stat);

residentMemGauge.set(structuredOutput.VmRSS, now);
virtualMemGauge.set(structuredOutput.VmSize, now);
heapSizeMemGauge.set(structuredOutput.VmData, now);
});
residentMemGauge.set(structuredOutput.VmRSS);
virtualMemGauge.set(structuredOutput.VmSize);
heapSizeMemGauge.set(structuredOutput.VmData);
} catch (er) {
return;
}
};
};

Expand Down
5 changes: 0 additions & 5 deletions lib/metrics/processCpuTotal.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,6 @@ const PROCESS_CPU_SYSTEM_SECONDS = 'process_cpu_system_seconds_total';
const PROCESS_CPU_SECONDS = 'process_cpu_seconds_total';

module.exports = (registry, config = {}) => {
// Don't do anything if the function doesn't exist (introduced in node@6.1.0)
if (typeof process.cpuUsage !== 'function') {
return () => {};
}

const registers = registry ? [registry] : undefined;
const namePrefix = config.prefix ? config.prefix : '';

Expand Down
18 changes: 5 additions & 13 deletions lib/metrics/processHandles.js
Original file line number Diff line number Diff line change
Expand Up @@ -28,19 +28,11 @@ module.exports = (registry, config = {}) => {
registers: registry ? [registry] : undefined
});

const updater = config.timestamps
? () => {
const handles = process._getActiveHandles();
updateMetrics(gauge, aggregateByObjectName(handles), true);
totalGauge.set(handles.length, Date.now());
}
: () => {
const handles = process._getActiveHandles();
updateMetrics(gauge, aggregateByObjectName(handles), false);
totalGauge.set(handles.length);
};

return updater;
return () => {
const handles = process._getActiveHandles();
updateMetrics(gauge, aggregateByObjectName(handles));
totalGauge.set(handles.length);
};
};

module.exports.metricNames = [
Expand Down
Loading

0 comments on commit 5aca6a9

Please sign in to comment.