-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Improve organization of AST code #8145
Comments
@vyasr how did you create an issue without choosing the type? Usually the system auto-populates some labels... |
You can create an issue from a comment on another PR or Issue by clicking on the three dots on the right and selecting "Reference in a new issue" (see the picture below). Issues created this way get the "Originally posted by..." note at the end of the issue. I'm surprised that it bypasses the auto-labeling though... |
Now that conditional joins have been merged, I'll use this issue as a checklist for the various improvements that we'd like to make to the code base: the checklist is now in the issue description. |
This is an issue. :) But in general checklists should go into the opening comment (the issue description). This way the checklist counter shows up in github's other interfaces (e.g. project boards). |
All good points :) I'll move the checklist. |
Resolves #8918 by providing a new API for getting the output size for conditional joins (except full joins). This PR removes the unnecessary `conditional_join.cuh` header and inlines the logic into the `conditional_join.cu` file where it is used, and adds the new logic into that file as well. The public APIs are now exposed in `conditional_join.hpp`. Adding the join size calculation also revealed a couple of bugs in the conditional join tests that were hiding a real bug in the conditional join implementation. The main test bug was the use of `std::equal` with the actual result as the first iterator, so if the actual result was empty it was never compared against the expected result (even if it was nonempty). This bug masked a couple of minor errors in the expected outputs encoded in the test. These are now fixed. This bug was also hiding a deeper issue where the AST device code was always using the left row index to pull data for the left hand operand to binary operations, even when the lhs was actually a column from the right table. That bug is now fixed as well. Contributes to #8145. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Jason Lowe (https://github.com/jlowe) - MithunR (https://github.com/mythrocks) URL: #8928
Contributes to #8980 and #8145. This PR makes all `operator_functors` templated on the potential nullability of the inputs, allowing per-operator customization of null handling with minimal performance penalties. The default specialization in the nullable case is a null-propagating fall-through to the non-nullable case, which matches standard libcudf semantics, but any specific operator's null handling behavior can be customized by explicit specialization of the `operator_functor` template corresponding to that operator. This PR implements `NULL_EQUAL`, `NULL_LOGICAL_AND`, and `NULL_LOGICAL_OR` operators that behave like their non-nullable counterparts when no nulls are present but can return non-null results from null inputs. The `NULL_EQUAL` operator replaces the `compare_nulls` parameter for conditional joins. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Jason Lowe (https://github.com/jlowe) - Robert (Bobby) Evans (https://github.com/revans2) - Bradley Dice (https://github.com/bdice) - https://github.com/nvdbaranec URL: #9096
Now that conditional joins have been merged, I'll use this PR as a checklist for the various improvements that we'd like to make to the code base:
compute_column
API to a user-facing namespace outside of ast: Move compute_column API out of ast namespace #8957Upgrade arity dispatching: either infer arity directly from the operator functions, or encode this data at a higher level and remove all the extra functorsThe fact thatoperator()
is itself templated for the variousoperator_functor
specializations makes this impossible without some very ugly workarounds that hurt readability and are in any case not robust to all possible cases.ImproveI've experimented with changingdevice_data_reference
/expression_result
structure.device_data_reference
from a POD struct into something with methods to encapsulate reading and writing logic and unify theexpression_result
class, but the amount of divergence between different reference types and output types means that the extra logic involved in dispatching appropriately obviates any real gains, at least with the current architecture.This issue was originally created in from #7418 (comment), but has since been heavily edited to reflect the updated task list.
The text was updated successfully, but these errors were encountered: