-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REST: refactor OAuth logic into AuthManager Interface #10621
Conversation
@jackye1995 Thank you so much for initiating this. I have been working on something similar as well, so it's great to see that there is a solid community engagement around this! A few very high-level questions/comments:
There is imho an impedance mismatch between the nothing of "sessions" in the REST catalog and the fact that authentication generally operates at request level.
However I wonder how you will be able to migrate I therefore think that, to migrate We'd also need pass it down to the HTTP layer, instead of passing a
Really an open question here. I wonder if, in fact, the HTTP layer isn't better suited for this new Besides, In fact I wonder: would this Anyways, thank you for this proposal! I will also leave a few comments inline. |
|
||
Map<String, String> headers(); | ||
|
||
void stopRefreshing(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stopRefreshing
sounds very "oauthy". I'm not sure the notion of token refreshing makes sense generally for all auth schemes.
I suggest that you make AuthSession
extend Closeable
and rename this method close
. Looking at the places where this method is called, it's always used to dispose the session when it gets evicted from the session cache, so I think that calling this close
makes sense and indicates to implementors that they should release all resources.
|
||
public interface AuthSession { | ||
|
||
Map<String, String> headers(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I stated in a previous comment, this is where I think we need something more powerful than just returning some headers to include in the request. This is because some auth schemes will need to inspect the whole request and make decisions based on its content/state.
Here is what I have currently:
void applyAuth(
URI requestUri,
String method,
Map<String, String> queryParams,
Object requestBody,
Map<String, List<String>> currentHeaders,
BiConsumer<String, String> headerConsumer);
Most impls would simply inject additional headers:
headerConsumer.accept("Authorization", "Bearer ****");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah for this to work with SigV4 you'll need all these details. Broadly, seems like there's 2 approaches, either define a higher level abstraction which will have these details (basically pulling out the relevant parts from the request like the header, query params, body, etc), like the current approach or lower the abstraction and use the CredentialsProvider
API that's already exposed on the HTTP client (that was originally intended for proxy connections but it's worth considering if we should add a generic API for specifying credentials provider).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amogh-jahagirdar you are spot on. I was asking myself the same question.
There are pros and cons for each approach:
- Higher level
- pros: integrates well with the catalog sessions and allows for context-level or table-level credentials (as it exists today with OAuth).
- cons: cannot leverage the existing machinery around
CredentialsProvider
; could potentially interfere with proxy credentials which are implemented in lower level.
- Lower level
- pros: very easy to implement all auth schemes using either
CredentialProvider
orHttpRequestInterceptor
. Interference with proxy credentials could be easily remediated. - cons: cannot integrate with catalog sessions; context-level or table-level sessions cannot use specialized credentials.
- pros: very easy to implement all auth schemes using either
My personal take is that the cons of a lower level approach are a blocker: we cannot give up of what's already built around table-level credentials. So I'd suggest to pursue the higher level approach that Jack initiated here.
private String scope; | ||
private Map<String, String> optionalOAuthParams; | ||
private String oauth2ServerUri; | ||
private boolean keepTokenRefreshed = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is something fishy with this field, it's never updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, you need to migrate this class to use AuthConfig
.
|
||
public interface AuthManager extends AutoCloseable { | ||
|
||
Map<String, String> mergeAuthHeadersForGetConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be improved. Here we are constructing an initial auth session so a better signature would be:
AuthSession initialAuth(RESTClient initialAuthClient, Map<String, String> initialHeaders);
It's very easy to adapt the only call site of that method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adutra Isn't that the purpose of the newSession
API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah @jackye1995 I'm not sure there really needs to be a separate mergeAuthHeadersForGetConfig API. Feels like the newSession
API should do all that in the implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say it's actually the opposite: we need more methods.
In my own exploration I came up with the following API which works well:
We need a different factory method for each "level" (or "scope") where an auth session could be introduced (and cached).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However I agree that the first method newInitialSession
is really only meant to produce a short-lived, temporary auth session in order to call the config endpoint. It's discarded immediately after. But I think we need it and we can't use newCatalogSession
in this case.
Yes that's the thing I am planning to prototype today. My position is that, if it fits then it fits, if it does not let's not try to fit it. We can have a no-op auth here and then enable sigv4 at lower layer. For that, I would like to invite @szehon-ho and @flyrain to take a look. I remember Apple has a use case for kerberos, I would like to see if this interface would also work for that. |
Thanks @jackye1995 for pinging me. Will take a look today. |
I explored a bit deeper today, including sigv4. My conclusion is that it is possible, provided that we modify It would be good to unify all authenticators including sigv4, rather than having sigv4 in a lower level and other auth schemes in a higher level. |
Pair<String, Supplier<AuthSession>> newSessionSupplier( | ||
Map<String, String> credentials, Map<String, String> properties, AuthSession parent); | ||
|
||
void initialize(String managerName, Map<String, String> properties); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of having a managerName
?
|
||
public interface AuthManager extends AutoCloseable { | ||
|
||
Map<String, String> mergeAuthHeadersForGetConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adutra Isn't that the purpose of the newSession
API?
|
||
public interface AuthManager extends AutoCloseable { | ||
|
||
Map<String, String> mergeAuthHeadersForGetConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah @jackye1995 I'm not sure there really needs to be a separate mergeAuthHeadersForGetConfig API. Feels like the newSession
API should do all that in the implementation?
|
||
public interface AuthSession { | ||
|
||
Map<String, String> headers(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah for this to work with SigV4 you'll need all these details. Broadly, seems like there's 2 approaches, either define a higher level abstraction which will have these details (basically pulling out the relevant parts from the request like the header, query params, body, etc), like the current approach or lower the abstraction and use the CredentialsProvider
API that's already exposed on the HTTP client (that was originally intended for proxy connections but it's worth considering if we should add a generic API for specifying credentials provider).
Thanks @jackye1995. I like the approach a lot overall. The abstraction is reasonable. The only concern is about how much we can abstract from different authentication methods. Things to consider like what if the authentication doesn't need refresh or it doesn't just deal with headers as @adutra and @amogh-jahagirdar already mentioned. Unfortunately, I'm not familiar with other authentication methods, like kerberos via HTTP. I hope people with related expertise can chime in. |
Sorry I will go ahead and close this PR, since I don't have bandwidth to continue working on this, it mainly tries to serve as a quick proof of concept. Please feel free to use the code in any shape or form and keep moving forward with it! |
@jackye1995 and @flyrain I also like what Jack initiated here, and I would like this to move forward. I will open another PR shortly, with both OAuth2 and SigV4 retrofitted to implement I cannot speak for Kerberos either, but given that SigV4 is a very "demanding" auth scheme, requiring full access to the outgoing request, I think we should be fine, even if Kerberos or other schemes also require access to the full request. I can however speak for OAuth2, having worked on a similar design for Nessie. I think we can build a much better OAuth support for the REST catalog using this PR as a foundation. |
A part of issue #10537, next step of #10603
This is a very rough draft to see how it could look like when the OAuth2 logic is abstracted away from the RESTSessionCatalog. There could be more discussions for how we define the exact boundary.
For SigV4, the plan is to also have one
AmazonSigV4AuthManager
iniceberg-aws
module, and can be loaded dynamically.@snazy @nastra @danielcweeks @rdblue