Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFD 150 Operationalizing Prometheus discussion #120

Open
trentm opened this issue Oct 30, 2018 · 20 comments
Open

RFD 150 Operationalizing Prometheus discussion #120

trentm opened this issue Oct 30, 2018 · 20 comments

Comments

@trentm
Copy link
Contributor

trentm commented Oct 30, 2018

THis is for discussion of RFD 150 to operationalize Prometheus and Grafana in Triton and Manta.

@bahamat
Copy link
Member

bahamat commented Oct 31, 2018

Do we need authentication on prometheus since only grafana will talk to it? If yes, we could just do digest auth with a very long random string that is updated by config-agent into both grafana and prometheus

Prometheus should use @arekinath's cmon-certgen. If the datacenter local sdc key that is already in SAPI is be used for signing the cmon cert there will be no need to add an additional key to the admin/prometheus accounts, and prometheus will operate with reduced privileges (setting aside the fact that it's already on the admin network).

@askfongjojo
Copy link
Contributor

askfongjojo commented Nov 2, 2018

Do we need authentication on prometheus since only grafana will talk to it?

Yes we need it to be accessible by human users in a secure way. Dashboard admin often accesses Prometheus directly to see the raw metrics data when designing the dashboards.

I have one comment/request regarding the retention period of the prometheus data. I believe it is something configurable in the deployment. Is there any future plan to support archival/retrieval of the older data? I understand that we cannot retain all the data indefinitely but there are many times I wish that I could see the data from the recent months.

@bahamat
Copy link
Member

bahamat commented Nov 2, 2018

Yes we need it to be accessible by human users in a secure way. Dashboard admin often accesses Prometheus directly to see the raw metrics data when designing the dashboards.

Isn't that already implied by having access to the admin network in the first place?

@trentm
Copy link
Contributor Author

trentm commented Nov 8, 2018

Do we need authentication on prometheus since only grafana will talk to it?

Yes we need it to be accessible by human users in a secure way. Dashboard admin often accesses Prometheus directly to see the raw metrics data when designing the dashboards.

@askfongjojo Though don't our VPN setups typically make the admin network routable so they can reach the prometheus zone that way? IOW, what @bahamat just said.

Is there any future plan to support archival/retrieval of the older data?

Yes. @richardkiene is looking at this (MANTA-3881).

@jclulow
Copy link
Contributor

jclulow commented Nov 8, 2018

Though don't our VPN setups typically make the admin network routable so they can reach the prometheus zone that way?

That's an accident of history, and we shouldn't build any software that assumes it will be available. It won't be that way in engineering staging and lab environments, for instance, to reflect the way the product is intended to be deployed: with the admin network attached only to internal Triton interfaces.

@trentm
Copy link
Contributor Author

trentm commented Nov 8, 2018

One option would be to

  1. do @isaacdavis's https auth proxy in the prometheus0 zone for its external NIC, and
  2. make sure that prometheus' HTTP directly isn't accessible over the external (same as is done with grafana, likely)

@bahamat
Copy link
Member

bahamat commented Nov 8, 2018

I think that if we're talking about our engineers accessing prometheus for the purpose of testing queries, browsing data, and building dashboards, we can access it via the VPN or some method.

If we're talking about customer on-prem deployments or the intended deployment model for Triton, I'm not sure there's necessarily a valid need to access prometheus directly since there will be a fully configured grafana instance with dashboards.

@davepacheco
Copy link
Contributor

Thanks for writing all this up! I have a couple of longer-term questions.

I'm interested in the definition (and distribution) of recording rules and dashboards. In our prod deployments, recording rules are essential for most of the dashboards. We also have the related problems of identifying key metrics and documenting them for operators. It would be incredibly useful if we could, say:

  • define recording rules for key metrics with the software that delivers them (e.g., muskie)
  • have those rules propagated to Prometheus instances that scrape those metrics
  • include documentation with the rules (e.g., what the metric means, how to interpret it)
  • define dashboards somewhere with the software (could be global to Manta). (There are some tools out there for doing this.)

With this, engineers, QA, and Ops could stamp out the exact same dashboards in development and staging environments that we use in production. We could also build pretty docs and maybe even incorporate them into the Grafana dashboards.


There are some notes in the RFD about sharding, but have we thought yet about how it's going to work? I could imagine a couple of approaches. One might be that we say the SAPI service is the unit of sharding and we have configuration somewhere that assigns Prometheus instances to services. For small deployments, maybe we have one Prometheus that monitors everything. To replicate what we've done in some larger deployments, we might assign one for Muskie, one for CMON, and one for everything else. As things grow, we could dynamically reassign stuff, though depending on how that works, we might lose historical data when that happens.

Relatedly, it would be useful if you could set up different sets of Prometheus servers potentially monitoring the same components. To update Prometheus, you might set up a second fleet of them on the new version, having them collect from the same endpoints, and once they've accumulated enough historical data, you switch over to them and undeploy the others. I believe Ops has done (or is doing) this now. You could also do this when reassigning shards to avoid losing data.

I realize a lot of this is probably further down the road, and sorry if I'm missing some context!

@bahamat
Copy link
Member

bahamat commented Nov 27, 2018

Some of the sharding and/or aggregation things will be solved once we have Thanos support.

@kellymclaughlin
Copy link

Intuitively, we must generate a unique tag per Prometheus instance and assign these tags to Manta zones such that every zone has a tag.

I might have just missed this when reading, but why is it necessary to assign a tag to each manta zone? My impression was that the Prometheus instances would be scraping or pulling the metrics from a list of zones. If that's the case then it shouldn't matter if each manta zone knows who is scraping it as long as some instance does. Then managing the distribution of load for this work among the Prometheus instances could be done randomly as mentioned or even just in round-robin fashion, but the reassignment shouldn't be a very intensive process. And with Thanos able to do deduplication of the metrics the only constraint on the application of the changed zone assignments to each Prometheus instance should be to apply the additions first to all instances and then apply any removals.

@kellymclaughlin
Copy link

Maybe this isn't decided yet, but I know that Thanos is intended to store metrics into manta and I was curious if it would be storing them as data in the monitored manta system under an operator account or if the intent was to setup as a separate manta system specifically to store the metrics. If it's the former case, I was wondering about request volume and any potential impact to user-facing operations. I'm guessing may be too early to know that, but just wondered while reading.

@isaacdavis
Copy link
Contributor

I might have just missed this when reading, but why is it necessary to assign a tag to each manta zone? My impression was that the Prometheus instances would be scraping or pulling the metrics from a list of zones. If that's the case then it shouldn't matter if each manta zone knows who is scraping it as long as some instance does.

@kellymclaughlin Each zone needs a tag because this is the mechanism by which Prometheus will filter its list of zones to scrape for sharding - if we specify a given tag in the Prometheus config, it will only scrape zones that have that tag. (See https://smartos.org/bugview/TRITON-755 for more detail)

Maybe this isn't decided yet, but I know that Thanos is intended to store metrics into manta and I was curious if it would be storing them as data in the monitored manta system under an operator account or if the intent was to setup as a separate manta system specifically to store the metrics. If it's the former case, I was wondering about request volume and any potential impact to user-facing operations. I'm guessing may be too early to know that, but just wondered while reading.

Perhaps @richardkiene could weigh in on this?

@kellymclaughlin
Copy link

Each zone needs a tag because this is the mechanism by which Prometheus will filter its list of zones to scrape for sharding - if we specify a given tag in the Prometheus config, it will only scrape zones that have that tag. (See https://smartos.org/bugview/TRITON-755 for more detail)

Ok I see. Why not just have a list of zones provided to each Prometheus instance that it needs to scrape rather than having to tag each zone? That way Prometheus is the only thing that must be configured rather than having to write metadata to every zone? The CLI tool mentioned in the RFD could take a list of zones and a list of Prom instances as input and divide the zones list among the Prom instances based on some means. Is it the trade-off that the tags enable detection of new zones for scraping without reconfiguration while making reconfiguration more cumbersome when new Prom instances are added?

@isaacdavis
Copy link
Contributor

Exactly - that's the trade-off. This is all done through triton_sd_config, which lets cmon handle zone discovery and provides the extra metrics gathered by the various cmon collectors.

For what it's worth, I expect that we won't be adding new Prom instances frequently. We could mitigate some of the expense by making the CLI tool reassign only the number of tags necessary rather than naively iterating over all of the zones: if we have n zones being scraped and are scaling from m to m+1 Prometheus instances, we only need to reassign n/(m+1) zones to achieve an even distribution, so we wouldn't have to write metadata to every single zone, except when we deploy Prometheus for the first time.

Do you think the expense of assigning tags like this is a dealbreaker, or something we could get away with?

@kellymclaughlin
Copy link

Do you think the expense of assigning tags like this is a dealbreaker, or something we could get away with?

No, I don't think that is a dealbreaker. I think if you have a plan to be able to only reassign a portion of the zone tags when a new Prometheus is added and you have the ability as mentioned in the RFD to make manual adjustments where the randomization falls short of finding an even distribution then it seems like a good choice given the trade-offs involved.

I might suggest updating this sentence to convey that reassigning only a portion of the zone tags is possible:

This scheme would require a complete reassignment of tags every time a Prometheus instance is added or removed. This would be identical to assigning tags for the first time, and would thus be expensive.

Thanks for the explanation!

@isaacdavis
Copy link
Contributor

Sounds good - I've updated the RFD. Thanks for the feedback!

@siepkes
Copy link

siepkes commented Apr 19, 2019

Hi @isaacdavis! I noticed in commit f920a8b you mentioned the use of BIND to workaround working with multiple CNS resolvers. As far as I can tell you only need a recursive name server. Might I suggest taking a look at unbound instead of BIND? Unbound is just a recursive nameserver (often used in combination with nsd when also a authoritative nameserver is needed). I know from experience that BIND gives one quite a big gun to shoot one self in the foot... unbound is single-purpose (recursive DNS server), lightweight (yet supports almost all DNS RFC's, very easy to configure and secure solution.

@isaacdavis
Copy link
Contributor

Thanks, Jasper! I will take a look.

@cburroughs
Copy link
Contributor

There will be one Prometheus image, shared between the Triton and Manta prometheus services.

There were be a shared image, but independent instances correct?

@isaacdavis
Copy link
Contributor

@cburroughs correct!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants