-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: improve sensitive data handling by fsspec connectors #2194
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
rbiseck3
force-pushed
the
roman/improve-sensitive-data-handling-ingest
branch
from
November 30, 2023 22:37
3e41f88
to
fac10cd
Compare
…test fixtures update (#2206) This pull request includes updated ingest test fixtures. Please review and merge if appropriate. Co-authored-by: rbiseck3 <rbiseck3@users.noreply.github.com>
rbiseck3
force-pushed
the
roman/improve-sensitive-data-handling-ingest
branch
from
December 5, 2023 20:19
9e91552
to
3acc016
Compare
potter-potter
approved these changes
Dec 5, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone through everything. A lot to go through. But works well and I didn't see anything suspicious. Looks good. Really liking the access_configs instead of access_kwargs.
github-merge-queue bot
pushed a commit
that referenced
this pull request
Dec 11, 2023
### Description All other connectors that were not included in #2194 are now updated to follow the new pattern and mark any variables as sensitive where it makes sense. Core changes: * All connectors now support an `AccessConfig` to mark data that's needed for auth (i.e. username, password) and those that are sensitive are designated appropriately using the new enhanced field. * All cli configs on the cli definition now inherit from the base config in the connector file to reuse the variables set on that dataclass * The base writer class was updated to better generalize the new approach given better use of dataclasses * The base cli classes were refactored to also take into account the need for a connector and write config when creating the respective runner/writer classes. * Any mismatch between the cli field name and the dataclass field name were updated on the dataclass side to not impact the user but maintain consistency * Add custom redaction logic for mongodb URIs since the password is expected to be a part of it. Now this: `"mongodb+srv://ingest-test-user:r4hK3BD07b@ingest-test.hgaig.mongodb.net/"` -> `"mongodb+srv://ingest-test-user:***REDACTED***@ingest-test.hgaig.mongodb.net/"` in the logs * Bundle all fsspec based files into their own packages. * Refactor custom `_decode_dataclass` used for enhanced json mixin by using a monkey-patch approach. The original approach was breaking on optional nested dataclasses when serializing since the other methods in `dataclasses_json_core` weren't using the new method. By monkey-patching the original method with a new one, all other methods in that library would use the new one. --------- Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: rbiseck3 <rbiseck3@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Building off of PR #2179, updating fsspec based connectors to use better authentication field handling. This PR adds in the following changes:
Slightly unrelated changes:
TODOs
#TODO
in the code but the way session handler is implemented right now, it breaks serialization since it adds a generic variable based on the library being used for a connector (i.e.googleapiclient.discovery.Resource
) which is not serializable. This will need to be updated to omit that from serialization but still support the current workflow.