-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Implement full support for nested types #11844
Labels
feature request
New feature or request
libcudf
Affects libcudf (C++/CUDA) code.
Performance
Performance related issue
Spark
Functionality that helps Spark RAPIDS
Milestone
Comments
GregoryKimball
added
feature request
New feature or request
Needs Triage
Need team to review and classify
labels
Oct 3, 2022
GregoryKimball
changed the title
[WIP] Story - Full support for nested types
[FEA] Story - Full support for nested types
Oct 3, 2022
GregoryKimball
added
libcudf
Affects libcudf (C++/CUDA) code.
Performance
Performance related issue
Spark
Functionality that helps Spark RAPIDS
helps: Python
and removed
Needs Triage
Need team to review and classify
labels
Oct 3, 2022
Is this issue just a more complete version of the last table in #10186 describing algorithms that should be using the comparators? |
4 tasks
3 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
Feb 8, 2023
Converts the `rank` function to use experimental row comparators, which support list and struct types. Part of #11844. [Throughput benchmarks](#12481 (comment)) are available below. It seems like when `size_bytes` is constrained, the generator generates fewer rows in `list` types for increasing depths. That's why, `depth=4` has a higher throughput than `depth=1` because the number of leaf elements generated are the same, but with much fewer rows. Authors: - Divye Gala (https://github.com/divyegala) - Jordan Jacobelli (https://github.com/Ethyling) Approvers: - Bradley Dice (https://github.com/bdice) - Yunsong Wang (https://github.com/PointKernel) - AJ Schmidt (https://github.com/ajschmidt8) URL: #12481
This was referenced Feb 9, 2023
rapids-bot bot
pushed a commit
that referenced
this issue
Feb 16, 2023
This PR is a part of #11844. Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Nghia Truong (https://github.com/ttnghia) - Yunsong Wang (https://github.com/PointKernel) URL: #12752
rapids-bot bot
pushed a commit
that referenced
this issue
Feb 17, 2023
This PR is a part of #11844. Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Nghia Truong (https://github.com/ttnghia) - Yunsong Wang (https://github.com/PointKernel) URL: #12761
rapids-bot bot
pushed a commit
that referenced
this issue
Mar 7, 2023
…or (#12776) This PR is a part of #11844 Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Nghia Truong (https://github.com/ttnghia) - Vyas Ramasubramani (https://github.com/vyasr) URL: #12776
3 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
Mar 10, 2023
Contributes to #11844 This PR migrates parquet encoding to use the experimental `nan_equal` equality check instead of the legacy `equality_compare`. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - David Wendt (https://github.com/davidwendt) - Nghia Truong (https://github.com/ttnghia) - Divye Gala (https://github.com/divyegala) URL: #12918
rapids-bot bot
pushed a commit
that referenced
this issue
Mar 22, 2023
…omparator (#12777) This PR is a part of #11844 Authors: - Divye Gala (https://github.com/divyegala) Approvers: - David Wendt (https://github.com/davidwendt) - Nghia Truong (https://github.com/ttnghia) URL: #12777
3 tasks
3 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
Apr 6, 2023
Part of #11844. I will create a separate PR for `mixed_join`. Compilation times: `main` 94bbc82 : `16m47.513s` This PR 5d75db8 : `16m47.520s` Benchmarks: #12787 (comment) Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Nghia Truong (https://github.com/ttnghia) URL: #12787
3 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
Apr 21, 2023
…3028) Part of #11844 `mixed_join` cannot support nested types as the conditional part relies on AST. This PR adds no new tests or benchmarks for this reason. [Benchmarks](#13028 (comment)) Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) URL: #13028
rapids-bot bot
pushed a commit
that referenced
this issue
Apr 25, 2023
…rator (#13119) This is a part of #11844 Benchmarks: #13119 (comment) Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Nghia Truong (https://github.com/ttnghia) URL: #13119
This should also address #13116. |
rapids-bot bot
pushed a commit
that referenced
this issue
Jun 29, 2023
Par of #11844 Authors: - Divye Gala (https://github.com/divyegala) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Nghia Truong (https://github.com/ttnghia) - Vyas Ramasubramani (https://github.com/vyasr) URL: #13069
3 tasks
3 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
Aug 3, 2023
Part of #11844 Authors: - Divye Gala (https://github.com/divyegala) - Nghia Truong (https://github.com/ttnghia) Approvers: - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) - Yunsong Wang (https://github.com/PointKernel) URL: #13810
3 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
Oct 28, 2023
…14250) Part of #11844 This PR also uses new experimental comparators for non-nested types by introducing a new device constructor for `cudf::experimental::row::lexicographic::device_row_comparator`. In the case of non-nested types, preprocessing can be skipped so comparators can be created on the fly. This solution helps us avoid creating 3 comparator types because `thrust::merge` can call the operator with indices from either side of the table. Furthermore, the PR reworks `cudf/detail/merge.cuh` by removing any CUDA headers/components to expose a true detail API of the form `cudf/detail/merge.hpp`. [Benchmark comparison for non-nested types](#14250 (comment)) Compilation time increases from ~6 mins to ~7 mins. Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Bradley Dice (https://github.com/bdice) - MithunR (https://github.com/mythrocks) URL: #14250
Closing for now. We believe this to be complete. Congratulations team! |
GregoryKimball
changed the title
[FEA] Story - Full support for nested types
[FEA] Implement full support for nested types
Mar 9, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
feature request
New feature or request
libcudf
Affects libcudf (C++/CUDA) code.
Performance
Performance related issue
Spark
Functionality that helps Spark RAPIDS
After the introduction of new row operators for nested types (#10186), it's time to build on that success with a new issue to focus on the outstanding items. The row operators for equality
=
(#10289) and hashing#
(#10641) included support for arbitrarily-nestedList
andStruct
types. List inequality<
(#11129) added support for arbitrarily-nestedList<List>
andStruct<List>
. Please note that the Spark-RAPIDS plugin has a companion issue (NVIDIA/spark-rapids#8550) to this story issue.Part 1: Transition algorithms to new row operators
=
️#
#
,=
#
,=
#
,=
<
relational_compare
insimple_comparator
as part of two-path solution insorted_order
=
equality_compare
as part of two-path solution insearch_list_non_nested_types_fn
, see also #11330 for the two-path solution for equality=
,<
#
<
row_lexicographic_tagged_comparator
that uses legacyelement_relational_comparator
as part of internal operations. Also see discussion in #13514 and #8050=
element_equality_comparator
inis_unique_iterator_fn
=
row_equality_comparator
inpermuted_comparator
=
row_equality_comparator
inpermuted_row_equality_comparator
, linked issue #8039=
equality_compare
in find/insert. #8476 convert to cuco and #10635 optimize#
row_hasher
inhash_partition_table
<
row_lexicographic_comparator
inrow_arg_minmax_fn
#8974, see also #8964. see draft work in #10811 and #13069<
row_lexicographic_comparator
inis_sorted
=
row_equality_comparator
inunique_comparator
=
row_equality_comparator
inunique
#
row_hasher
renamed asrow_hash
=
element_equality_comparator
inone_hot_encode_functor
#
,=
row_hasher
androw_equality_comparator
#
,=
=
row_equality_comparator
incorresponding_rows_unequal
#
,=
=
cudf::equality_compare
also see #13672<
list_minmax_util.cuh
struct_minmax_util.cuh
to handle top-level lists in aggregation valuesPart 2: Improving performance and adding functionality
<
sorted_order
based on templating experimental<
, with one instance for simple types and one for nested.=
lists::contains
, with one path for simple types using legacy=
and one path for nested using experimental=
.<
operator onStruct<List>
types<
operator forList<Struct>
types<
=
#
Part 3: Expand support for nested types in cuDF-python
Update: Part 3 will be made into a separate story issue once this issue is closed
drop_duplicates()
on nested typesgroupby.apply
on nested typesThe text was updated successfully, but these errors were encountered: