Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REST: refactor OAuth logic into AuthManager Interface #10621

Closed
wants to merge 1 commit into from

Conversation

jackye1995
Copy link
Contributor

A part of issue #10537, next step of #10603

This is a very rough draft to see how it could look like when the OAuth2 logic is abstracted away from the RESTSessionCatalog. There could be more discussions for how we define the exact boundary.

For SigV4, the plan is to also have one AmazonSigV4AuthManager in iceberg-aws module, and can be loaded dynamically.

@snazy @nastra @danielcweeks @rdblue

@jackye1995 jackye1995 requested a review from nastra July 2, 2024 19:46
@github-actions github-actions bot added the core label Jul 2, 2024
@adutra
Copy link
Contributor

adutra commented Jul 3, 2024

@jackye1995 Thank you so much for initiating this. I have been working on something similar as well, so it's great to see that there is a solid community engagement around this!

A few very high-level questions/comments:

  1. How are you going to migrate RESTSigV4Signer?

There is imho an impedance mismatch between the nothing of "sessions" in the REST catalog and the fact that authentication generally operates at request level.

AuthSession is the key component here. Indeed it aims to provide a bridge between the two, by keeping track of the session context, but also providing auth headers to the underlying REST client.

However I wonder how you will be able to migrate RESTSigV4Signer to the current AuthManager design. RESTSigV4Signer is currently implemented in a much lower lever, as an HttpRequestInterceptor, mainly because it needs to access the entire outgoing request: path, query parameters, headers, body etc.

I therefore think that, to migrate RESTSigV4Signer, we'd need to enhance AuthSession with a method that gives access to the entire request, and allows the session to inject auth headers in it.

We'd also need pass it down to the HTTP layer, instead of passing a Map<String, String> of auth headers only.

  1. Couldn't we reuse CredentialsProvider?

Really an open question here. I wonder if, in fact, the HTTP layer isn't better suited for this new AuthManager. The Apache HTTP client already has a lot of infrastructure to deal with authentication: not only it has HttpRequestInterceptor (the swiss knife of request customization) but also CredentialsProvider and its many implementations.

Besides, CredentialsProvider is already leaking through the HTTPClient API, see org.apache.iceberg.rest.HTTPClient.Builder#withProxyCredentialsProvider.

In fact I wonder: would this AuthManager work if we provide a proxy and a proxy credentials provider? My gut feeling right now is that it won't. We may want to keep this in mind as we work on the AuthManager.

Anyways, thank you for this proposal! I will also leave a few comments inline.


Map<String, String> headers();

void stopRefreshing();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stopRefreshing sounds very "oauthy". I'm not sure the notion of token refreshing makes sense generally for all auth schemes.

I suggest that you make AuthSession extend Closeable and rename this method close. Looking at the places where this method is called, it's always used to dispose the session when it gets evicted from the session cache, so I think that calling this close makes sense and indicates to implementors that they should release all resources.


public interface AuthSession {

Map<String, String> headers();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I stated in a previous comment, this is where I think we need something more powerful than just returning some headers to include in the request. This is because some auth schemes will need to inspect the whole request and make decisions based on its content/state.

Here is what I have currently:

  void applyAuth(
      URI requestUri,
      String method,
      Map<String, String> queryParams,
      Object requestBody,
      Map<String, List<String>> currentHeaders,
      BiConsumer<String, String> headerConsumer);

Most impls would simply inject additional headers:

    headerConsumer.accept("Authorization", "Bearer ****");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah for this to work with SigV4 you'll need all these details. Broadly, seems like there's 2 approaches, either define a higher level abstraction which will have these details (basically pulling out the relevant parts from the request like the header, query params, body, etc), like the current approach or lower the abstraction and use the CredentialsProvider API that's already exposed on the HTTP client (that was originally intended for proxy connections but it's worth considering if we should add a generic API for specifying credentials provider).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amogh-jahagirdar you are spot on. I was asking myself the same question.

There are pros and cons for each approach:

  1. Higher level
    1. pros: integrates well with the catalog sessions and allows for context-level or table-level credentials (as it exists today with OAuth).
    2. cons: cannot leverage the existing machinery around CredentialsProvider; could potentially interfere with proxy credentials which are implemented in lower level.
  2. Lower level
    1. pros: very easy to implement all auth schemes using either CredentialProvider or HttpRequestInterceptor. Interference with proxy credentials could be easily remediated.
    2. cons: cannot integrate with catalog sessions; context-level or table-level sessions cannot use specialized credentials.

My personal take is that the cons of a lower level approach are a blocker: we cannot give up of what's already built around table-level credentials. So I'd suggest to pursue the higher level approach that Jack initiated here.

private String scope;
private Map<String, String> optionalOAuthParams;
private String oauth2ServerUri;
private boolean keepTokenRefreshed = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is something fishy with this field, it's never updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, you need to migrate this class to use AuthConfig.


public interface AuthManager extends AutoCloseable {

Map<String, String> mergeAuthHeadersForGetConfig(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be improved. Here we are constructing an initial auth session so a better signature would be:

  AuthSession initialAuth(RESTClient initialAuthClient, Map<String, String> initialHeaders);

It's very easy to adapt the only call site of that method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adutra Isn't that the purpose of the newSession API?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah @jackye1995 I'm not sure there really needs to be a separate mergeAuthHeadersForGetConfig API. Feels like the newSession API should do all that in the implementation?

Copy link
Contributor

@adutra adutra Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it's actually the opposite: we need more methods.

In my own exploration I came up with the following API which works well:

https://github.com/adutra/iceberg/blob/auth-manager-adutra/core/src/main/java/org/apache/iceberg/rest/auth/AuthManager.java

We need a different factory method for each "level" (or "scope") where an auth session could be introduced (and cached).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However I agree that the first method newInitialSession is really only meant to produce a short-lived, temporary auth session in order to call the config endpoint. It's discarded immediately after. But I think we need it and we can't use newCatalogSession in this case.

@jackye1995
Copy link
Contributor Author

How are you going to migrate RESTSigV4Signer?

Yes that's the thing I am planning to prototype today. My position is that, if it fits then it fits, if it does not let's not try to fit it. We can have a no-op auth here and then enable sigv4 at lower layer.

For that, I would like to invite @szehon-ho and @flyrain to take a look. I remember Apple has a use case for kerberos, I would like to see if this interface would also work for that.

@flyrain
Copy link
Contributor

flyrain commented Jul 3, 2024

Thanks @jackye1995 for pinging me. Will take a look today.

@adutra
Copy link
Contributor

adutra commented Jul 3, 2024

Yes that's the thing I am planning to prototype today. My position is that, if it fits then it fits, if it does not let's not try to fit it.

I explored a bit deeper today, including sigv4. My conclusion is that it is possible, provided that we modify AuthSession to something more flexible.

It would be good to unify all authenticators including sigv4, rather than having sigv4 in a lower level and other auth schemes in a higher level.

Pair<String, Supplier<AuthSession>> newSessionSupplier(
Map<String, String> credentials, Map<String, String> properties, AuthSession parent);

void initialize(String managerName, Map<String, String> properties);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of having a managerName?


public interface AuthManager extends AutoCloseable {

Map<String, String> mergeAuthHeadersForGetConfig(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adutra Isn't that the purpose of the newSession API?


public interface AuthManager extends AutoCloseable {

Map<String, String> mergeAuthHeadersForGetConfig(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah @jackye1995 I'm not sure there really needs to be a separate mergeAuthHeadersForGetConfig API. Feels like the newSession API should do all that in the implementation?


public interface AuthSession {

Map<String, String> headers();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah for this to work with SigV4 you'll need all these details. Broadly, seems like there's 2 approaches, either define a higher level abstraction which will have these details (basically pulling out the relevant parts from the request like the header, query params, body, etc), like the current approach or lower the abstraction and use the CredentialsProvider API that's already exposed on the HTTP client (that was originally intended for proxy connections but it's worth considering if we should add a generic API for specifying credentials provider).

@flyrain
Copy link
Contributor

flyrain commented Jul 17, 2024

Thanks @jackye1995. I like the approach a lot overall. The abstraction is reasonable. The only concern is about how much we can abstract from different authentication methods. Things to consider like what if the authentication doesn't need refresh or it doesn't just deal with headers as @adutra and @amogh-jahagirdar already mentioned. Unfortunately, I'm not familiar with other authentication methods, like kerberos via HTTP. I hope people with related expertise can chime in.

@jackye1995
Copy link
Contributor Author

Sorry I will go ahead and close this PR, since I don't have bandwidth to continue working on this, it mainly tries to serve as a quick proof of concept. Please feel free to use the code in any shape or form and keep moving forward with it!

@jackye1995 jackye1995 closed this Jul 17, 2024
@adutra
Copy link
Contributor

adutra commented Jul 19, 2024

@jackye1995 and @flyrain I also like what Jack initiated here, and I would like this to move forward. I will open another PR shortly, with both OAuth2 and SigV4 retrofitted to implement AuthManager.

I cannot speak for Kerberos either, but given that SigV4 is a very "demanding" auth scheme, requiring full access to the outgoing request, I think we should be fine, even if Kerberos or other schemes also require access to the full request.

I can however speak for OAuth2, having worked on a similar design for Nessie. I think we can build a much better OAuth support for the REST catalog using this PR as a foundation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants