RFD 150 Operationalizing Prometheus discussion #120

trentm · 2018-10-30T20:33:12Z

THis is for discussion of RFD 150 to operationalize Prometheus and Grafana in Triton and Manta.

bahamat · 2018-10-31T00:30:29Z

Do we need authentication on prometheus since only grafana will talk to it? If yes, we could just do digest auth with a very long random string that is updated by config-agent into both grafana and prometheus

Prometheus should use @arekinath's cmon-certgen. If the datacenter local sdc key that is already in SAPI is be used for signing the cmon cert there will be no need to add an additional key to the admin/prometheus accounts, and prometheus will operate with reduced privileges (setting aside the fact that it's already on the admin network).

askfongjojo · 2018-11-02T21:51:31Z

Do we need authentication on prometheus since only grafana will talk to it?

Yes we need it to be accessible by human users in a secure way. Dashboard admin often accesses Prometheus directly to see the raw metrics data when designing the dashboards.

I have one comment/request regarding the retention period of the prometheus data. I believe it is something configurable in the deployment. Is there any future plan to support archival/retrieval of the older data? I understand that we cannot retain all the data indefinitely but there are many times I wish that I could see the data from the recent months.

bahamat · 2018-11-02T22:38:07Z

Yes we need it to be accessible by human users in a secure way. Dashboard admin often accesses Prometheus directly to see the raw metrics data when designing the dashboards.

Isn't that already implied by having access to the admin network in the first place?

trentm · 2018-11-08T16:18:41Z

Do we need authentication on prometheus since only grafana will talk to it?

Yes we need it to be accessible by human users in a secure way. Dashboard admin often accesses Prometheus directly to see the raw metrics data when designing the dashboards.

@askfongjojo Though don't our VPN setups typically make the admin network routable so they can reach the prometheus zone that way? IOW, what @bahamat just said.

Is there any future plan to support archival/retrieval of the older data?

Yes. @richardkiene is looking at this (MANTA-3881).

jclulow · 2018-11-08T16:22:39Z

Though don't our VPN setups typically make the admin network routable so they can reach the prometheus zone that way?

That's an accident of history, and we shouldn't build any software that assumes it will be available. It won't be that way in engineering staging and lab environments, for instance, to reflect the way the product is intended to be deployed: with the admin network attached only to internal Triton interfaces.

trentm · 2018-11-08T16:36:20Z

One option would be to

do @isaacdavis's https auth proxy in the prometheus0 zone for its external NIC, and
make sure that prometheus' HTTP directly isn't accessible over the external (same as is done with grafana, likely)

bahamat · 2018-11-08T16:45:01Z

I think that if we're talking about our engineers accessing prometheus for the purpose of testing queries, browsing data, and building dashboards, we can access it via the VPN or some method.

If we're talking about customer on-prem deployments or the intended deployment model for Triton, I'm not sure there's necessarily a valid need to access prometheus directly since there will be a fully configured grafana instance with dashboards.

davepacheco · 2018-11-16T19:06:35Z

Thanks for writing all this up! I have a couple of longer-term questions.

I'm interested in the definition (and distribution) of recording rules and dashboards. In our prod deployments, recording rules are essential for most of the dashboards. We also have the related problems of identifying key metrics and documenting them for operators. It would be incredibly useful if we could, say:

define recording rules for key metrics with the software that delivers them (e.g., muskie)
have those rules propagated to Prometheus instances that scrape those metrics
include documentation with the rules (e.g., what the metric means, how to interpret it)
define dashboards somewhere with the software (could be global to Manta). (There are some tools out there for doing this.)

With this, engineers, QA, and Ops could stamp out the exact same dashboards in development and staging environments that we use in production. We could also build pretty docs and maybe even incorporate them into the Grafana dashboards.

There are some notes in the RFD about sharding, but have we thought yet about how it's going to work? I could imagine a couple of approaches. One might be that we say the SAPI service is the unit of sharding and we have configuration somewhere that assigns Prometheus instances to services. For small deployments, maybe we have one Prometheus that monitors everything. To replicate what we've done in some larger deployments, we might assign one for Muskie, one for CMON, and one for everything else. As things grow, we could dynamically reassign stuff, though depending on how that works, we might lose historical data when that happens.

Relatedly, it would be useful if you could set up different sets of Prometheus servers potentially monitoring the same components. To update Prometheus, you might set up a second fleet of them on the new version, having them collect from the same endpoints, and once they've accumulated enough historical data, you switch over to them and undeploy the others. I believe Ops has done (or is doing) this now. You could also do this when reassigning shards to avoid losing data.

I realize a lot of this is probably further down the road, and sorry if I'm missing some context!

bahamat · 2018-11-27T21:37:37Z

Some of the sharding and/or aggregation things will be solved once we have Thanos support.

kellymclaughlin · 2019-02-19T18:08:54Z

Intuitively, we must generate a unique tag per Prometheus instance and assign these tags to Manta zones such that every zone has a tag.

I might have just missed this when reading, but why is it necessary to assign a tag to each manta zone? My impression was that the Prometheus instances would be scraping or pulling the metrics from a list of zones. If that's the case then it shouldn't matter if each manta zone knows who is scraping it as long as some instance does. Then managing the distribution of load for this work among the Prometheus instances could be done randomly as mentioned or even just in round-robin fashion, but the reassignment shouldn't be a very intensive process. And with Thanos able to do deduplication of the metrics the only constraint on the application of the changed zone assignments to each Prometheus instance should be to apply the additions first to all instances and then apply any removals.

kellymclaughlin · 2019-02-19T18:12:45Z

Maybe this isn't decided yet, but I know that Thanos is intended to store metrics into manta and I was curious if it would be storing them as data in the monitored manta system under an operator account or if the intent was to setup as a separate manta system specifically to store the metrics. If it's the former case, I was wondering about request volume and any potential impact to user-facing operations. I'm guessing may be too early to know that, but just wondered while reading.

isaacdavis · 2019-02-19T18:46:18Z

I might have just missed this when reading, but why is it necessary to assign a tag to each manta zone? My impression was that the Prometheus instances would be scraping or pulling the metrics from a list of zones. If that's the case then it shouldn't matter if each manta zone knows who is scraping it as long as some instance does.

@kellymclaughlin Each zone needs a tag because this is the mechanism by which Prometheus will filter its list of zones to scrape for sharding - if we specify a given tag in the Prometheus config, it will only scrape zones that have that tag. (See https://smartos.org/bugview/TRITON-755 for more detail)

Maybe this isn't decided yet, but I know that Thanos is intended to store metrics into manta and I was curious if it would be storing them as data in the monitored manta system under an operator account or if the intent was to setup as a separate manta system specifically to store the metrics. If it's the former case, I was wondering about request volume and any potential impact to user-facing operations. I'm guessing may be too early to know that, but just wondered while reading.

Perhaps @richardkiene could weigh in on this?

kellymclaughlin · 2019-02-19T18:55:04Z

Each zone needs a tag because this is the mechanism by which Prometheus will filter its list of zones to scrape for sharding - if we specify a given tag in the Prometheus config, it will only scrape zones that have that tag. (See https://smartos.org/bugview/TRITON-755 for more detail)

Ok I see. Why not just have a list of zones provided to each Prometheus instance that it needs to scrape rather than having to tag each zone? That way Prometheus is the only thing that must be configured rather than having to write metadata to every zone? The CLI tool mentioned in the RFD could take a list of zones and a list of Prom instances as input and divide the zones list among the Prom instances based on some means. Is it the trade-off that the tags enable detection of new zones for scraping without reconfiguration while making reconfiguration more cumbersome when new Prom instances are added?

isaacdavis · 2019-02-19T21:35:08Z

Exactly - that's the trade-off. This is all done through triton_sd_config, which lets cmon handle zone discovery and provides the extra metrics gathered by the various cmon collectors.

For what it's worth, I expect that we won't be adding new Prom instances frequently. We could mitigate some of the expense by making the CLI tool reassign only the number of tags necessary rather than naively iterating over all of the zones: if we have n zones being scraped and are scaling from m to m+1 Prometheus instances, we only need to reassign n/(m+1) zones to achieve an even distribution, so we wouldn't have to write metadata to every single zone, except when we deploy Prometheus for the first time.

Do you think the expense of assigning tags like this is a dealbreaker, or something we could get away with?

kellymclaughlin · 2019-02-20T16:04:30Z

Do you think the expense of assigning tags like this is a dealbreaker, or something we could get away with?

No, I don't think that is a dealbreaker. I think if you have a plan to be able to only reassign a portion of the zone tags when a new Prometheus is added and you have the ability as mentioned in the RFD to make manual adjustments where the randomization falls short of finding an even distribution then it seems like a good choice given the trade-offs involved.

I might suggest updating this sentence to convey that reassigning only a portion of the zone tags is possible:

This scheme would require a complete reassignment of tags every time a Prometheus instance is added or removed. This would be identical to assigning tags for the first time, and would thus be expensive.

Thanks for the explanation!

isaacdavis · 2019-02-20T19:01:38Z

Sounds good - I've updated the RFD. Thanks for the feedback!

siepkes · 2019-04-19T13:32:58Z

Hi @isaacdavis! I noticed in commit f920a8b you mentioned the use of BIND to workaround working with multiple CNS resolvers. As far as I can tell you only need a recursive name server. Might I suggest taking a look at unbound instead of BIND? Unbound is just a recursive nameserver (often used in combination with nsd when also a authoritative nameserver is needed). I know from experience that BIND gives one quite a big gun to shoot one self in the foot... unbound is single-purpose (recursive DNS server), lightweight (yet supports almost all DNS RFC's, very easy to configure and secure solution.

isaacdavis · 2019-04-19T23:19:36Z

Thanks, Jasper! I will take a look.

cburroughs · 2019-04-22T18:58:37Z

There will be one Prometheus image, shared between the Triton and Manta prometheus services.

There were be a shared image, but independent instances correct?

isaacdavis · 2019-04-22T19:05:49Z

@cburroughs correct!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFD 150 Operationalizing Prometheus discussion #120

RFD 150 Operationalizing Prometheus discussion #120

trentm commented Oct 30, 2018 •

edited

Loading

bahamat commented Oct 31, 2018

askfongjojo commented Nov 2, 2018 •

edited

Loading

bahamat commented Nov 2, 2018

trentm commented Nov 8, 2018

jclulow commented Nov 8, 2018

trentm commented Nov 8, 2018

bahamat commented Nov 8, 2018

davepacheco commented Nov 16, 2018

bahamat commented Nov 27, 2018

kellymclaughlin commented Feb 19, 2019

kellymclaughlin commented Feb 19, 2019

isaacdavis commented Feb 19, 2019

kellymclaughlin commented Feb 19, 2019

isaacdavis commented Feb 19, 2019

kellymclaughlin commented Feb 20, 2019

isaacdavis commented Feb 20, 2019

siepkes commented Apr 19, 2019

isaacdavis commented Apr 19, 2019

cburroughs commented Apr 22, 2019

isaacdavis commented Apr 22, 2019

RFD 150 Operationalizing Prometheus discussion #120

RFD 150 Operationalizing Prometheus discussion #120

Comments

trentm commented Oct 30, 2018 • edited Loading

bahamat commented Oct 31, 2018

askfongjojo commented Nov 2, 2018 • edited Loading

bahamat commented Nov 2, 2018

trentm commented Nov 8, 2018

jclulow commented Nov 8, 2018

trentm commented Nov 8, 2018

bahamat commented Nov 8, 2018

davepacheco commented Nov 16, 2018

bahamat commented Nov 27, 2018

kellymclaughlin commented Feb 19, 2019

kellymclaughlin commented Feb 19, 2019

isaacdavis commented Feb 19, 2019

kellymclaughlin commented Feb 19, 2019

isaacdavis commented Feb 19, 2019

kellymclaughlin commented Feb 20, 2019

isaacdavis commented Feb 20, 2019

siepkes commented Apr 19, 2019

isaacdavis commented Apr 19, 2019

cburroughs commented Apr 22, 2019

isaacdavis commented Apr 22, 2019

trentm commented Oct 30, 2018 •

edited

Loading

askfongjojo commented Nov 2, 2018 •

edited

Loading