Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REST: honor OAuth config sent by the server #10256

Closed
wants to merge 1 commit into from

Conversation

adutra
Copy link
Contributor

@adutra adutra commented Apr 30, 2024

As mandated by the spec, in case the config endpoint overrides any auth-specific property, the overridden values should be taken into account when creating the catalog auth session instance.

As mandated by the spec, in case the config endpoint
overrides any auth-specific property, the overridden
values should be taken into account when creating the
catalog auth session instance.
Comment on lines +219 to +222
credential = mergedProps.get(OAuth2Properties.CREDENTIAL);
scope = mergedProps.getOrDefault(OAuth2Properties.SCOPE, OAuth2Properties.CATALOG_SCOPE);
oauth2ServerUri =
mergedProps.getOrDefault(OAuth2Properties.OAUTH2_SERVER_URI, ResourcePaths.tokens());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good idea to get certain configs from server. Thanks for doing this. Should we add these into config endpoint spec if we are doing this? Without something in the spec, we are relying on the convention. There is a minor concern that the config response from the server side may happen to have the same name with different meaning. For example, a property named scope in the config response may mean different things. cc @danielcweeks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @flyrain thank you so much for reviewing!

Regarding the risk of config name clashes, most properties have a prefix such as s3., adls., http-client., ecs., etc. This prefix makes name clashes rather unlikely, while also clearly indicating the "scope" the property belongs to.

Unfortunately, none of the properties in OAuth2Properties have such a prefix, which makes them rather prone to misinterpretation indeed.

This is imho a bit out of the scope if this PR, but I agree with you that we could probably do something about it, e.g. introduce a prefix auth. that the client would then automatically "map" to properties in OAuth2Properties. But we'd still need to support "unprefixed" properties for a long time I'm afraid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a blocker for me. The current config response is a flat key-value pairs. I think it works well since they are only for client side catalog configs. We can bring more structures into to it in the future, as you suggested, e.g., a prefix auth.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think allowing override of the the auth server uri is a good idea. The default should either be the spec'ed auth server path or something that is controlled by the client. This would allow the client to be redirected to a server that they would not expect or be in control of.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default should (...) be the spec'ed auth server path

I strongly disagree with this, the /v1/oauth/tokens should, as I mentioned above, have really never become part of the Iceberg REST spec:

  • Each client passes the just base64 encoded credentials to that endpoint - that proxies requests to something you don't know - and gets some token back. This means, the resource knows your credentials.
  • It is the opposite of what OAuth addresses: OAuth addresses these issues by introducing an authorization layer and separating the role of the client from that of the resource owner. - see RFC 6749.

@@ -215,6 +215,12 @@ public void initialize(String name, Map<String, String> unresolved) {
this.paths = ResourcePaths.forCatalogProperties(mergedProps);

String token = mergedProps.get(OAuth2Properties.TOKEN);
// re-resolve these variables in case they were overridden by the config endpoint
credential = mergedProps.get(OAuth2Properties.CREDENTIAL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is weird to me that the client gets the credential from server's config endpoint, which doesn't need authentication. Does that mean any client can visit the REST catalog? I think we still need a client to provide its own credential.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that the client can already get a token from the server's config endpoint, so why not a credential? :-)

Taking a step back, I'd like this PR to focus on what the client is supposed to do with configuration received from the server, not what the server is supposed to send. For that, there is an ongoing discussion here where @jbonofre and others suggest many changes, e.g. the introduction of a "handshake" or "pre-flight" endpoint to facilitate client auto-configuration, including auth properties. But server-side changes will take a lot more time.

And back to the client side of things: I think it makes sense for the client to apply all configuration received from the server. It's then up to the server to decide which options make sense to send, and which don't.

One important property I'd like to see properly handled is oauth2-server-uri. I'd also like to see the map optionalOAuthParams correctly handled, since it may contain more important config properties in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think line 217 is to get the token from client config other than from the server config response. mergedProps mixes properties from both client config and server config response.

Also I believe a client should only get tokens either from users' config or from token server URI response. Config endpoint shouldn't be used for passing tokens. Otherwise, every client may use the same token/credentials, which doesn't make sense to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would second @flyrain's original comment that the client should not be exchanging credentials via this path. The client credentials flow is for auth "previously arranged with the authorization server". The token should be used if returned by the server, but credentials should client controlled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /v1/config endpoint - titled as "List all catalog configuration settings" - seems to require authentication. Catalog configuration settings can already contain sensitive information, or information that you do not want everybody to know (warehouse location for example) - so it makes a lot of sense to require authentication for that endpoint.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the config endpoint seems to already return sensitive information - in line 217 above (String token = mergedProps.get(OAuth2Properties.TOKEN);)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to require authentication.

There is no requirement that use of the REST protocol requires auth. For people operating their own internal REST implementations, they can choose to support auth or not.

the config endpoint seems to already return sensitive information

The config endpoint may return sensitive information or not, but I won't recommend returning sensitive information if the communication channel is not secure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resource (here: Iceberg REST endpoint) is (but should have really never been) the de-facto authentication server, because of the existence of the existence of the /v1/oauth/tokens endpoint. So the argument that the config-resource should not return credentials feels odd.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's actually correct to send back a credential to the client via the config endpoint. The client credential flow follows the OAuth2 Spec as defined here (there's also a shorter example for this here) and so sending back a client-controlled credential to the client doesn't feel right

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yet, that is exactly what is being done here, but from the tables endpoint.

Copy link
Contributor

@danielcweeks danielcweeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I don't think credentials or oauthServerUri should be allowed to be overridden by the server. These should be client controlled configurations, which only leaves scope. The current reference implementation only includes the catalog scope, but I'm open to that being configurable (though I'm not convinced it's really necessary as a server side control).

@jbonofre
Copy link
Member

Maybe the confusion is due to the config endpoint is mixing different semantic properties, also related to the scope.
For clarity, it would be better to split non authenticated config (handshake) from authenticated one (config), that's part of the "new" REST Catalog proposal.

I understand @snazy points, but related to the current implementation (as @danielcweeks said), it's confusing.

If I may, I would propose to:

  1. the current impl should be improved, I think @adutra change is OK if we add scope configurable to deal with server side push
  2. this change (dual config endpoints, auth and non auth) will be part of the "new" REST Catalog proposal

Thoughts ?

@snazy
Copy link
Member

snazy commented May 14, 2024

If I may, I would propose to:

1. the current impl should be improved, I think @adutra change is OK if we add scope configurable to deal with server side push

2. this change (dual config endpoints, auth and non auth) will be part of the "new" REST Catalog proposal

+1

@adutra
Copy link
Contributor Author

adutra commented May 14, 2024

Not a fan of leaving oauthServerUri out of the scope of this PR, for the reasons @snazy explained.

Also what about optionalOAuthParams? For now it contains only resource and audience, but we plan to add more config properties in the future, and these would naturally fit inside optionalOAuthParams.

If optionalOAuthParams is not configurable by the server, this means that virtually nothing related to auth is configurable through the server, which imho would make configuring client auth quite a complex task as more options could be added in the future.

@rdblue
Copy link
Contributor

rdblue commented May 14, 2024

I'm surprised by this PR because I don't think that the auth properties should be overridden by a REST service. I'm not sure about it, but it sounds like @snazy seems to agree when he says that the OAuth2 endpoint shouldn't be part of the REST spec, when @flyrain points out that you should authenticate before hitting the config endpoint, and @danielcweeks points out that credentials are specifically called out in OAuth2 as client controlled.

@adutra, what are you trying to accomplish here? Is this needed for some use case that you could, perhaps, explain?

Right now, I see the assertion that the REST spec requires this, which I don't agree with (but maybe?) and also @adutra saying things like "I think it makes sense for the client to apply all configuration received" -- is that the basis for this?

I think I need to understand the motivation. Otherwise, I think the argument that auth (including the auth endpoint) should be client-side configuration only. This could introduce a way to get the client's credential -- that is, secret -- and that's something we should be very careful about.

Copy link
Member

@jbonofre jbonofre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should send back the credential to the client via the config endpoint by default (as already discussed, we should have a dedicated endpoint like handshake).
Correct me if I'm wrong, but it's already possible to get client credentials via v1/oauth/token endpoint (and token).
So basically client should provide only credentials (from client to server), and server returns token. Then client should only use token.

I understand the concern about the server returning the credentials, but it should not be the default.

IMHO, the main issue is that REST Catalog itself should not do authentication by itself, it should support auth token, but the authentication should be dedicated to another layer (okta, specific endpoint, ...).
We have to be very careful in exchange credentials token introduced in #4771 as it introduced a way to get the client's credentials.

So, I would propose:

  1. Can we evaluate the effort to "decoupling" authentication from token use ? Like introducing a configuration with location of the authentication service (it could be the oauth/token endpoint or third party like okta)
  2. Can we remove getting client's credentials from token from oauth/token endpoint ? It can introduce bad usage/oauth flow imho.
  3. If 1 and 2 are do-able, we don't need this PR as it can be tackled. If not do-able (please provide reasons), then we can introduce a scope configuration in this PR to returns credentials (even if I'm not a big fan 😄 ).

Thoughts ?

@adutra
Copy link
Contributor Author

adutra commented May 15, 2024

@rdblue I have no other intention than the one of bringing the spec in line with the current implementation.

There are two areas where spec and implementation differ with respect to authentication:

  1. Authentication properties from the config endpoint. The OpenAPI Spec for the config endpoint currently states:

Catalog configuration is constructed by setting the defaults, then client-provided configuration, and finally
overrides. The final property set is then used to configure the catalog.

As we've seen, this is currently not true for many authentication properties, including credential and
oauth2-server-uri, to name a few. But it holds true for token.

  1. Authentication properties returned in a LoadTableResult structure. The OpenAPI Spec for the structure currently
    states:

The config map returns table-specific configuration for the table's resources [..] The following configurations
should be respected by clients: [..] token.

Note that credential, for instance, is not listed there, and yet it is currently honored by the client. IOW, if the
server sends a credential in the config map, the client will use it to authenticate. There is even a test for that: TestRESTCatalog.testTableCredential.

All of this tends to suggest that authentication is not handled consistently across the board. The token property is
generally respected, but credential is not. The oauth2-server-uri property is not respected at all.

Such inconsistencies are confusing and make it hard to reason about the system. My intention was to bring the spec in
line with the current implementation by honoring credential and oauth2-server-uri, among others, from the config
endpoint.

Let me finish by addressing the security concerns you raised.

While I would generally agree with you that authentication properties should not be shared between client and resource
server, I think we are past that point. The client already happily accepts whatever token the resource server gives
it, and in some cases, it accepts even the resource server's credential too, and uses those to authenticate. This is
in complete contradiction with OAuth 2.0 principles
. Moreover, this behavior is validated by tests, so it's not
something that we can change now.

This brings me to the tacit conclusion that client and server must form a sort of trusted pair, functioning in a trusted
environment. In that situation, I don't see why the client could not obtain all of its authentication configuration from
the server, rather than just some portions of it.

@danielcweeks
Copy link
Contributor

My intention was to bring the spec in
line with the current implementation by honoring credential and oauth2-server-uri, among others, from the config
endpoint.

I don't believe this is a safe and raises a huge security concern for me. A client should never be in a situation where it is sending credentials or valid tokens to an server that the client did not explicitly designate. This change allows redirecting the auth server which should expose sensitive information to the wrong party.

@snazy
Copy link
Member

snazy commented May 15, 2024

This change allows redirecting the auth server which should expose sensitive information to the wrong party.

Why should a (malicious) Iceberg REST endpoint do the more complex redirect-dance, if it can get the nearly clear-text credentials due to the /v1/oauth/tokens route introduced by #4771? This change tries to mitigate that security issue (clear text credentials) by telling the client to use the correct oauth endpoint - nothing else.

@danielcweeks
Copy link
Contributor

Why should a (malicious) Iceberg REST endpoint do the more complex redirect-dance, if it can get the nearly clear-text credentials due to the /v1/oauth/tokens route introduced by #4771? This change tries to mitigate that security issue (clear text credentials) by telling the client to use the correct oauth endpoint - nothing else.

@snazy Users can override the oauth server, so for example if they set the auth server to be an Okta endpoint and they have credentials to authenticate with Okta, the resulting token from the client credential flow would go to the REST server (this is ok). However, if the REST server then redirects the client somewhere else, any subsequent operations (included additional credential flows) would send those credentials to a second party (this is not ok).

The rest server should not be redirecting a client configured auth server. That's not safe. The client should be fully in control of which auth server it uses.

@rdblue
Copy link
Contributor

rdblue commented May 15, 2024

@adutra, thanks for noticing and pointing out the inconsistencies. I do think it's important to address those. In particular, I did not realize that the table session that was introduced recently used the newSession codepath that uses credential.

(I also understand your reading of the config endpoint, but I don't think that implies that configs like the OAuth2 URI should be handled that way.)

We need to address these cases, but I think that this PR goes the wrong direction: instead of allowing auth properties from the config and table routes, we actually need to be more strict. A table session should never use a credential from the catalog service.

I think it would help if we had a common understanding of the difference between a token and a credential. A credential is a pre-established ID and secret pair that is used for authentication. A credential is controlled and owned by the client and is typically long-lived, like a username/password. A token is similar because it can be used for authentication (i.e. bearer auth) but tokens are short-lived and can contain a lot more information added by the service that is validated by signing the token.

In the REST protocol with OAuth2, credential is used for the first authentication step, which is to use the client credentials flow to exchange it for a token with the OAuth2 endpoint (now configurable). Both the endpoint and the credential are user-controlled because the credential is a long-lived secret and the endpoint must be trusted to receive that secret. The token that is returned to the client has a much shorter lifespan, which is why it is okay to share the token with the catalog routes and not only the OAuth2 endpoint.

Another reason why tokens are handled differently is that tokens can contain additional information that is signed by the service. That additional information can be used to check something once and use the result for the lifetime of the token. When designing the REST protocol, we thought that it was likely that tokens would be used for this purpose, so we built a couple of ways to exchange an auth token (from OAuth2) for a more specific or narrow token.

For example, the service could verify that the client is not coming from a blocked IP range. Checking IP ranges is expensive, so you'd want to do it once and produce a token that shows the incoming IP has been validated. As long as the token's IP claim is the same as the incoming IP, you can trust that the validation was done and not repeat the expensive check.

As you can see, there's a good reason to handle token and credential differently. Tokens must be sent to the catalog service for bearer auth and implementations are free to issue more narrow tokens in reasonable places. However, credentials should be more protected and only sent to the correct OAuth2 endpoint.

For this PR, the question is how to handle credential, scope, and oauth2-server-uri. For those, I think the current handling is correct.

  • credential -- This is a pre-established shared secret. It should never be supplied by the catalog service, especially when token is suitable for any purpose that credential might be used for.
  • scope -- I don't think this one makes sense. Why would a service tell the client to request more capabilities?
  • oauth2-server-uri -- If the service can send a different URI, then it can tell the client to send its credential to a different endpoint. I don't think there's any value to that, given that auth needs to happen before config . This would just increase the surface area for attacks, which we don't want to do however unlikely they may be.

In the end, allowing these to be overridden by catalog endpoints would weaken security and introduce extra requests (keep requirements simple!). I don't think it is a good idea to move forward with this PR. That said, good job catching that credential from the table route would be used in a session. Can you open a PR to disallow that?

@adutra
Copy link
Contributor Author

adutra commented May 16, 2024

Hi @rdblue, thank you for your detailed answer.

I am really sorry that this PR, that I thought would be a fairly consensual one, eventually cracked open a can of worms that I did not intend to open.

I'm pleased to realize that, in spite of some apparent divergences, we all seem to agree that the current handling of authentication suffers from a few loopholes – such as the credential passing from table routes – and that we should strive to make things more strict, not less. That is the mindset that I think we all should strive to adopt, when it comes to security.

That being said, I agree to close this PR and open another one to fix the table routes instead.

There remains, however, the broader question, mentioned a few times above, of a better separation of concerns between the conceptual OAuth 2.0 roles of "authorization" and "resource" servers. But this PR has never had the ambition of tackling that, so I'd like to request that we resume that discussion later in a more appropriate place, like our mailing list.

@adutra adutra closed this May 16, 2024
adutra added a commit to adutra/iceberg that referenced this pull request May 17, 2024
See apache#10256 for context.

This change disallows overriding the "credential" property
in table sessions, by introducing an allow-list of
auth-related properties that can be overridden in such
situations.

Only the "token" property and properties used to exchange
one token for another ("urn:ietf:params:oauth:token-type:*")
are now allowed.
adutra added a commit to adutra/iceberg that referenced this pull request Jun 5, 2024
See apache#10256 for context.

This change disallows overriding the "credential" property
in table sessions, by introducing an allow-list of
auth-related properties that can be overridden in such
situations.

Only the "token" property and properties used to exchange
one token for another ("urn:ietf:params:oauth:token-type:*")
are now allowed.
nastra pushed a commit that referenced this pull request Jul 4, 2024
See #10256 for context.

This change disallows overriding the "credential" property
in table sessions, by introducing an allow-list of
auth-related properties that can be overridden in such
situations.

Only the "token" property and properties used to exchange
one token for another ("urn:ietf:params:oauth:token-type:*")
are now allowed.
jasonf20 pushed a commit to jasonf20/iceberg that referenced this pull request Aug 4, 2024
See apache#10256 for context.

This change disallows overriding the "credential" property
in table sessions, by introducing an allow-list of
auth-related properties that can be overridden in such
situations.

Only the "token" property and properties used to exchange
one token for another ("urn:ietf:params:oauth:token-type:*")
are now allowed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants