Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Prometheus #150

Conversation

danielealbano
Copy link
Owner

@danielealbano danielealbano commented Jul 7, 2022

This PR implements a new module in cachegrand to support Prometheus, a well known and widely used monitoring and time series database.

The new module relies on http_parser, from the nodejs project, to process the http requests, and provides the metrics under the /metrics end point and a generic 404 not found error page for all the other URLs.

The error page is statically built-in and can't be modified, currently it doesn't make too much sense to expose this kind of configuration as this module doesn't provide a fully flagged http webserver (e.g. able to provide static content from the disk).

The module expose all the available internal statistics provided by cachegrand, for the occasion these have actually need expanded to include metrics related to the amount of data sent and received, not only in terms of packets but also of bytes, and to provide an uptime.

Two groups of metrics are offered, total counters and per minute counters. so scraping the data each minute is enough to do not lose any data.

The module doesn't support any special metric type apart from the basic counters (e.g. percentiles, etc.).

To give as much flexibility as possible, the module will search for environment variables prefixed with CACHEGRAND_METRIC_ENV_ and use these as labels, so an env variable called CACHEGRAND_METRIC_ENV_ORGANIZATION=example.org will become a label called organization with value example.org.

The example configuration has been updated to include a commented out example of the necessary parameters to enable the module, the prometheus documentation has been updated to provide as much information are possible and some basic tests are included in the PR as well.

Below an example of the metrics provided with 2 labels provided via environment variables

cachegrand_network_total_received_packets{tenant="test_tenant",org="test"} 0
cachegrand_network_total_received_data{tenant="test_tenant",org="test"} 0
cachegrand_network_total_sent_packets{tenant="test_tenant",org="test"} 0
cachegrand_network_total_sent_data{tenant="test_tenant",org="test"} 0
cachegrand_network_total_accepted_connections{tenant="test_tenant",org="test"} 0
cachegrand_network_total_active_connections{tenant="test_tenant",org="test"} 0
cachegrand_storage_total_written_data{tenant="test_tenant",org="test"} 0
cachegrand_storage_total_write_iops{tenant="test_tenant",org="test"} 0
cachegrand_storage_total_read_data{tenant="test_tenant",org="test"} 0
cachegrand_storage_total_read_iops{tenant="test_tenant",org="test"} 0
cachegrand_storage_total_open_files{tenant="test_tenant",org="test"} 0
cachegrand_network_per_minute_received_packets{tenant="test_tenant",org="test"} 0
cachegrand_network_per_minute_received_data{tenant="test_tenant",org="test"} 0
cachegrand_network_per_minute_sent_packets{tenant="test_tenant",org="test"} 0
cachegrand_network_per_minute_sent_data{tenant="test_tenant",org="test"} 0
cachegrand_network_per_minute_accepted_connections{tenant="test_tenant",org="test"} 0
cachegrand_storage_per_minute_written_data{tenant="test_tenant",org="test"} 0
cachegrand_storage_per_minute_write_iops{tenant="test_tenant",org="test"} 0
cachegrand_storage_per_minute_read_data{tenant="test_tenant",org="test"} 0
cachegrand_storage_per_minute_read_iops{tenant="test_tenant",org="test"} 0
cachegrand_uptime{tenant="test_tenant",org="test"} 3

Closes #149

…o on the received and sent data and to use the wall clock time for the last update timestamp
…-to-support-the-basic-scraping-of-the-metrics-via-the-text-protocol
@danielealbano danielealbano self-assigned this Jul 7, 2022
@danielealbano danielealbano added the enhancement New feature or request label Jul 7, 2022
@danielealbano danielealbano added this to the v0.2 milestone Jul 7, 2022
@codecov
Copy link

codecov bot commented Jul 7, 2022

Codecov Report

Merging #150 (689b70a) into main (9bea127) will increase coverage by 0.54%.
The diff coverage is 84.13%.

@@            Coverage Diff             @@
##             main     #150      +/-   ##
==========================================
+ Coverage   80.18%   80.72%   +0.54%     
==========================================
  Files          87       88       +1     
  Lines        5193     5515     +322     
==========================================
+ Hits         4164     4452     +288     
- Misses       1029     1063      +34     
Impacted Files Coverage Δ
src/config_cyaml_schema.c 100.00% <ø> (ø)
src/slab_allocator.c 95.01% <33.33%> (+2.32%) ⬆️
src/program.c 29.24% <34.78%> (+1.60%) ⬆️
src/clock.c 83.33% <66.66%> (-5.56%) ⬇️
.../protocol/prometheus/network_protocol_prometheus.c 84.42% <84.42%> (ø)
src/network/network.c 69.44% <100.00%> (+1.79%) ⬆️
...rc/network/protocol/redis/network_protocol_redis.c 80.69% <100.00%> (ø)
src/storage/storage.c 93.93% <100.00%> (ø)
src/worker/network/worker_network_op.c 86.13% <100.00%> (+1.19%) ⬆️
src/worker/worker.c 81.36% <100.00%> (+0.23%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9bea127...689b70a. Read the comment docs.

@justinholmes justinholmes merged commit 62ac35a into main Jul 7, 2022
@justinholmes justinholmes deleted the 149-add-an-module-for-prometheus-to-support-the-basic-scraping-of-the-metrics-via-the-text-protocol branch July 7, 2022 20:48
@lgtm-com
Copy link
Contributor

lgtm-com bot commented Jul 7, 2022

This pull request introduces 1 alert when merging 689b70a into dc16335 - view on LGTM.com

new alerts:

  • 1 for Wrong type of arguments to formatting function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add an module for Prometheus to support the basic scraping of the metrics via the text protocol
2 participants