Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tenant resolver #3486

Merged
merged 7 commits into from
Nov 26, 2020
Merged

Conversation

simonswine
Copy link
Contributor

What this PR does:

Which issue(s) this PR fixes:

This implements the multi tenant resolver as described by the proposal for multi tenant query-federation.

By default it behaves like before, but it's implementation can be swapped out.

This replaces:

  • user.ExtractOrgID by either TenantID or UserID depending on if in this context a multi tenant usage should be allowed
  • user.ExtractOrgIDFromHTTPRequest with ExtractTenantIDFromHTTPRequest to gather exactly one tenant id.

I have also added lint test to check for usage of the upstream methods directly

Checklist

  • Tests updated
  • [not necessary] Documentation added
  • [not necessary] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Comment on lines 11 to 16
var defaultResolver Resolver = NewSingleResolver()

func DefaultResolver() Resolver {
return defaultResolver
}

func WithDefaultResolver(r Resolver) {
defaultResolver = r
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a global variable here is not something I am particularly happy about.

I have tried to pass a tenantResolver reference around that would itself be instantiated in a Cortex module. But this approach has a massive changeset, as its touching so many areas of the codebase and I decided to go with this approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a global is definitely acceptable in this case. A package could always instantiate it's own resolver down the line if there was a reason to go down that path.

Copy link
Contributor

@jtlisi jtlisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 11 to 16
var defaultResolver Resolver = NewSingleResolver()

func DefaultResolver() Resolver {
return defaultResolver
}

func WithDefaultResolver(r Resolver) {
defaultResolver = r
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a global is definitely acceptable in this case. A package could always instantiate it's own resolver down the line if there was a reason to go down that path.

// resolver directly from a HTTP request.
func ExtractTenantIDFromHTTPRequest(resolver Resolver, req *http.Request) (string, context.Context, error) {
//lint:ignore faillint wrapper around upstream method
_, ctx, err := user.ExtractOrgIDFromHTTPRequest(req)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, so we are going to continue using the weaveworks/common library as is? It just won't be aware of how the header value will be resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes being able intercept headers would require us to do something that I have trailed #3416, which came with a massive change set across the project.

@@ -36,7 +36,7 @@ type UserConfig struct {
func (am *MultitenantAlertmanager) GetUserConfig(w http.ResponseWriter, r *http.Request) {
logger := util.WithContext(r.Context(), am.logger)

userID, err := user.ExtractOrgID(r.Context())
userID, err := tenant.DefaultResolver().TenantID(r.Context())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion(if-minor): Instead of retrieving the default resolver and calling these functions, what if you expose the functions of the Resolver interface as top level functions in the tenant package and just assume they will utilize the default in the package. That could make the syntax a bit cleaner.

Suggested change
userID, err := tenant.DefaultResolver().TenantID(r.Context())
userID, err := tenant.TenantID(r.Context())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done that, golint complained about the stuttering issue: tenant.TenantID[s], but I think it's important to have it there to make it clear UserID vs TenantID.

pkg/configs/api/api.go Outdated Show resolved Hide resolved
Copy link
Contributor

@csmarchbanks csmarchbanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love some more detail around when UserID will be used vs TenantID/TenantIDs. I don't see any info in the proposal around that (or I am blind which is totally possible). Is this meant to be something like a username? Just an extra header that will be passed?

The comment "used to identify a user in log messages" scares me a bit as we could be getting into PII which I certainly do not want to encourage people to log.

@simonswine
Copy link
Contributor Author

I would love some more detail around when UserID will be used vs TenantID/TenantIDs. I don't see any info in the proposal around that (or I am blind which is totally possible). Is this meant to be something like a username? Just an extra header that will be passed?

@csmarchbanks agreed. This is something where the current comments on the interface tenant.Resolver are not enough and we potentially need to update the proposal to define that properly.

What I think we should be trying to solve is how to move away from the overloaded meaning that is currently derived from user.ExtractOrgID. It is the tenant ID, but it's also used for identifying limits and the origin of certain requests in the logs and metrics. What I tried to do in the scope of this PR is to ensure we call the correct method depending on which of those things the intent is at the point where currently user.ExtractOrgID is called.

While I think using the default settings cortex will always behave TentantID() == TenantIDs[0] == UserID() (and everything else would break compatibility expectations), it would be a good opportunity to evolve the tenant.Resolver by the experiences we gain from adding more experimental features like this one.

As you are aware we are using Cortex as a basis of Grafana Metric Enterprise and we have some plans that involve distinguishing more details between different actors using the API within the same tenant. I see that as a first step in getting a good understanding what is involved there.

I hope that makes sense and wonder what if you think this could go forward, with some of that baked into the comments of the tenant.Resolver?

@pstibrany
Copy link
Contributor

What I think we should be trying to solve is how to move away from the overloaded meaning that is currently derived from user.ExtractOrgID. It is the tenant ID, but it's also used for identifying limits and the origin of certain requests in the logs and metrics.

I don't see this as overloaded. Introducing new userID concept is confusing, since in Cortex we don't deal with individual users in any way.

While I think using the default settings cortex will always behave TentantID() == TenantIDs[0] == UserID()

In the proposal, multiple tenants were sorted. I am bit confused by this statement (esp. by "TenantIDs[0]" part, since that will always be tenant with lowest ID??)

Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using userID without tenantID is misleading. Different tenants may have same set of users. For log messages, metrics or fairness, both need to be used.

For metrics especially, using user can lead to additional cardinality explosion, and is explicitly called out as bad practice.

From the code it seems like what you're calling "user ID" is basically a sorted set of tenant IDs. This mix of terminology is confusing.

This code replaces user.ExtractOrgID calls, but not user.InjectOrgID calls. Furthermore, there are components from weaveworks library, that we use, that still use original user.ExtractOrgID (instrumentation, tenant propagation for grpc calls). I think this needs to be fixed, preferably in the weaveworks common library. (Otherwise we need to reimplement these helper components on top of new "tenant" package).

@simonswine
Copy link
Contributor Author

Using userID without tenantID is misleading. Different tenants may have same set of users. For log messages, metrics or fairness, both need to be used.

Totally agreed, yes eventually we will need information in the context.Context that will tell us both the user and the tenant, so we can consider both for those decisions.

For metrics especially, using user can lead to additional cardinality explosion, and is explicitly called out as bad practice.

From the code it seems like what you're calling "user ID" is basically a sorted set of tenant IDs. This mix of terminology is confusing.

Both of those are correct and I know this is far from ideal, but that was called out as part of the proposal to be a step-gap solution, that really only affects you if you enabled multi-tenant support and ideally log both and so on

This code replaces user.ExtractOrgID calls, but not user.InjectOrgID calls.

I don't think this is a problem as we ourselves in the tenant package call ExtractOrgID and derive our values from it.

While I think using the default settings cortex will always behave TentantID() == TenantIDs[0] == UserID()

In the proposal, multiple tenants were sorted. I am bit confused by this statement (esp. by "TenantIDs[0]" part, since that will always be tenant with lowest ID??)

Sorry I have really phrased that in aquite confusing way, what I meant to say. Using the default resolver, which is defaultResolver = NewSingleResolver(), all three methods which replace ExtractOrgID basically return the same. The string returned in UserID/TenantID is the same and it also matches with the first and only returned element in the []string in TenantIDs

Furthermore, there are components from weaveworks library, that we use, that still use original user.ExtractOrgID (instrumentation, tenant propagation for grpc calls). I think this needs to be fixed, preferably in the weaveworks common library. (Otherwise we need to reimplement these helper components on top of new "tenant" package).

I think that's how my vision for the future work would look like, either making weaveworks/common more adaptable so that we can reuse our own context extraction/injection from HTTP/GRPC or we have to move them into the project. Especially code which is quite specific to how Cortex works is probably better to live within here.

I am right now not yet in a position to predict how we would achieve that and how interfaces and more specific would look like, but by gradually carving parts out (like this PR) I think we can make this transition work.

@simonswine
Copy link
Contributor Author

I am wondering if this PR would have it easier to get consensus if I would remove the UserID() string method from the Resolver interface:

  • TenantID() (which will only ever return a single tenant), replaces all usages of ExtractOrgID.
  • TenantIDs(), would then only be used on the read path where it's necessary for multi-tenant querying.

wdyt @pstibrany?

@pstibrany pstibrany dismissed their stale review November 23, 2020 14:37

Withdrawing my request in case someone else wants to approve. (I plan to take another look but not sure when)

@pstibrany
Copy link
Contributor

I am wondering if this PR would have it easier to get consensus if I would remove the UserID() string method from the Resolver interface:

  • TenantID() (which will only ever return a single tenant), replaces all usages of ExtractOrgID.
  • TenantIDs(), would then only be used on the read path where it's necessary for multi-tenant querying.

These suggestions would make it less controversial

@simonswine simonswine force-pushed the add-tenant-resolver branch 2 times, most recently from 23c4d2b to af2045e Compare November 24, 2020 10:16
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good job, LGTM!

sort.Strings(ids)

// de-duplicate orgIDs
var result []string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance wise a couple of things:

  1. The input ids is just fine if there are no duplicates (which I would expect to be the common case)
  2. result make be pre-allocated with the length of ids

Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing the feedback, LGTM. Currently it's a noop in Cortex, and I'm looking forward to see more support for multitenant.

This implements the multi tenant resolver as described by the [proposal]
for multi tenant query-federation.

By default it behaves like before, but it's implementation can be
swapped out.

[proposal]: cortexproject#3364

Signed-off-by: Christian Simon <simon@swine.de>
Use TenantID or UserID depending on which of the methods are meant to be
used.

Signed-off-by: Christian Simon <simon@swine.de>
This is replaced by ExtractTenantIDFromHTTPRequest, which makes sure
that exactly one tenant ID is set.

Signed-off-by: Christian Simon <simon@swine.de>
Signed-off-by: Christian Simon <simon@swine.de>
We need a better definition for what we are trying to achieve with
UserID before we can add it to the interface

Signed-off-by: Christian Simon <simon@swine.de>
Signed-off-by: Christian Simon <simon@swine.de>
- reduce allocations by reusing the input slice during de-duplication

Signed-off-by: Christian Simon <simon@swine.de>
@pracucci pracucci merged commit b7a062f into cortexproject:master Nov 26, 2020
simonswine added a commit to simonswine/dskit that referenced this pull request Feb 21, 2022
* Add tenant resolver package

This implements the multi tenant resolver as described by the [proposal]
for multi tenant query-federation.

By default it behaves like before, but it's implementation can be
swapped out.

[proposal]: cortexproject/cortex#3364

Signed-off-by: Christian Simon <simon@swine.de>

* Replace usages of `ExtractOrgID`

Use TenantID or UserID depending on which of the methods are meant to be
used.

Signed-off-by: Christian Simon <simon@swine.de>

* Replace usages of `ExtractOrgIDFromHTTPRequest`

This is replaced by ExtractTenantIDFromHTTPRequest, which makes sure
that exactly one tenant ID is set.

Signed-off-by: Christian Simon <simon@swine.de>

* Add methods to `tenant` package to use resolver directly

Signed-off-by: Christian Simon <simon@swine.de>

* Remove UserID method from Resolver interface

We need a better definition for what we are trying to achieve with
UserID before we can add it to the interface

Signed-off-by: Christian Simon <simon@swine.de>

* Update comment on the TenantID/TenantIDs

Signed-off-by: Christian Simon <simon@swine.de>

* Improve performance of NormalizeTenantIDs

- reduce allocations by reusing the input slice during de-duplication

Signed-off-by: Christian Simon <simon@swine.de>
simonswine added a commit to grafana/dskit that referenced this pull request Mar 22, 2022
* Add tenant resolver package

This implements the multi tenant resolver as described by the [proposal]
for multi tenant query-federation.

By default it behaves like before, but it's implementation can be
swapped out.

[proposal]: cortexproject/cortex#3364

Signed-off-by: Christian Simon <simon@swine.de>

* Replace usages of `ExtractOrgID`

Use TenantID or UserID depending on which of the methods are meant to be
used.

Signed-off-by: Christian Simon <simon@swine.de>

* Replace usages of `ExtractOrgIDFromHTTPRequest`

This is replaced by ExtractTenantIDFromHTTPRequest, which makes sure
that exactly one tenant ID is set.

Signed-off-by: Christian Simon <simon@swine.de>

* Add methods to `tenant` package to use resolver directly

Signed-off-by: Christian Simon <simon@swine.de>

* Remove UserID method from Resolver interface

We need a better definition for what we are trying to achieve with
UserID before we can add it to the interface

Signed-off-by: Christian Simon <simon@swine.de>

* Update comment on the TenantID/TenantIDs

Signed-off-by: Christian Simon <simon@swine.de>

* Improve performance of NormalizeTenantIDs

- reduce allocations by reusing the input slice during de-duplication

Signed-off-by: Christian Simon <simon@swine.de>
simonswine added a commit to grafana/dskit that referenced this pull request Mar 29, 2022
* Add tenant resolver (cortexproject/cortex#3486)

* Add tenant resolver package

This implements the multi tenant resolver as described by the [proposal]
for multi tenant query-federation.

By default it behaves like before, but it's implementation can be
swapped out.

[proposal]: cortexproject/cortex#3364

Signed-off-by: Christian Simon <simon@swine.de>

* Replace usages of `ExtractOrgID`

Use TenantID or UserID depending on which of the methods are meant to be
used.

Signed-off-by: Christian Simon <simon@swine.de>

* Replace usages of `ExtractOrgIDFromHTTPRequest`

This is replaced by ExtractTenantIDFromHTTPRequest, which makes sure
that exactly one tenant ID is set.

Signed-off-by: Christian Simon <simon@swine.de>

* Add methods to `tenant` package to use resolver directly

Signed-off-by: Christian Simon <simon@swine.de>

* Remove UserID method from Resolver interface

We need a better definition for what we are trying to achieve with
UserID before we can add it to the interface

Signed-off-by: Christian Simon <simon@swine.de>

* Update comment on the TenantID/TenantIDs

Signed-off-by: Christian Simon <simon@swine.de>

* Improve performance of NormalizeTenantIDs

- reduce allocations by reusing the input slice during de-duplication

Signed-off-by: Christian Simon <simon@swine.de>

* Add multi tenant query federation (cortexproject/cortex#3250)

* Add tenant query federation

This experimental feature allows queries to cover data from more than a
single Cortex tenant and can be activated by supplying
`-tenant-federation.enabled` to all cortex instances.

To query multiple tenants a `|` separated tenant list can be specified
in the `X-Scope-OrgID` header. The source tenant of a metric will be
exposed in the label `__tenant_id__`.

Signed-off-by: Christian Simon <simon@swine.de>

* Aggregate the limit of maxQueriers correctly

This ensures the limit is aggregated correctly of the setting of each
individual tenant. It also implements the logic for the v2 query
frontend.

Signed-off-by: Christian Simon <simon@swine.de>

* Fix tenant labels and make LabelNames more efficient

Signed-off-by: Christian Simon <simon@swine.de>

* Use tsdb_errors for error handling

Signed-off-by: Christian Simon <simon@swine.de>

* Fix uninitialized label matcher

Regexp matcher need to be initialized, this adds a test for regexp
matcher and fixes the underlying issue.

Signed-off-by: Christian Simon <simon@swine.de>

* Use map for filterValuesByMatchers to reduce allocations

Signed-off-by: Christian Simon <simon@swine.de>

* Address review suggestions

Signed-off-by: Christian Simon <simon@swine.de>

* Add validation.SmallestPositiveNonZeroIntPerTenant to avoid code duplication

Signed-off-by: Christian Simon <simon@swine.de>

* Add limitations and experimental status to docs

Signed-off-by: Christian Simon <simon@swine.de>

* Remove unnecessary cast of defaultTenantLabel

Signed-off-by: Christian Simon <simon@swine.de>

* Handle query-range limits for multi tenant queries

Signed-off-by: Christian Simon <simon@swine.de>

* Skip results cache, if multi tenants query

Signed-off-by: Christian Simon <simon@swine.de>

* Add failint to ensure query path supports multiple tenants

To avoid any future regressions in the multi tenant support within the
query path, a faillint command tests if TenantID is used and if it
finds one, it suggestest using TenantIDs instead>

Signed-off-by: Christian Simon <simon@swine.de>

* Align CHANGELOG line with the flag description

Signed-off-by: Christian Simon <simon@swine.de>

* Add a limitation about missing results cache

Signed-off-by: Christian Simon <simon@swine.de>

* Restrict path segments in TenantIDs (cortexproject/cortex#4375) (cortexproject/cortex#4376)

This prevents paths generated from TenantIDs to become vulnerable to
path traversal attacks. CVE-2021-36157

Signed-off-by: Christian Simon <simon@swine.de>

* Update nolint statement

Co-authored-by: Bryan Boreham <bjboreham@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants