-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data][api] implement HudiDataSource
#46273
base: master
Are you sure you want to change the base?
Conversation
7bc3894
to
97f9de1
Compare
ds = ray.data.read_hudi_table(target_table_path) | ||
|
||
assert ds.count() == 5 | ||
assert ds.schema() is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gonna add more assertions on the dataset content
--extra-index-url https://test.pypi.org/simple | ||
hudi==0.1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is temporary for this PR to pass. the --extra-index-url
will be removed once the official release is out
self, | ||
table_uri: str, | ||
storage_options: Optional[Dict[str, str]] = None, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would prefer to support column projections and row filtering through additional arguments as follow-up since the feature is not yet supported
@MicroCheck //python:ray/data/tests/test_hudi Signed-off-by: Shiyan Xu <2701446+xushiyan@users.noreply.github.com>
97f9de1
to
d4e8af6
Compare
Why are these changes needed?
Support read from Hudi table into Ray dataset.
Related issue number
Closes #46272
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.