-
Notifications
You must be signed in to change notification settings - Fork 985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drill 7955: Extend XOR bitwise functions to match AND and OR #2264
Conversation
I was unsure about how to select a reviewer, apologies if I've spammed you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments and questions but in general looks good.
@@ -138,7 +138,7 @@ unaryMathFunctions : [ | |||
{input: "UInt8", outputType: "UInt8", castType: "long"} | |||
] | |||
}, | |||
{className: "LeftShift", funcName: "lshift", javaFunc: " << ", types: [ | |||
{className: "LeftShift", funcName: "lshift", aliasName: "", javaFunc: " << ", types: [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we want this for all INT
types like SHORT
, TINYINT
etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment for div
@@ -125,7 +125,7 @@ unaryMathFunctions : [ | |||
{input: "UInt8", outputType: "UInt8", castType: "long"} | |||
] | |||
}, | |||
{className: "Mod", funcName: "mod", javaFunc : " % ", types: [ | |||
{className: "Mod", funcName: "mod", aliasName: "", javaFunc : " % ", types: [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as elsewhere... would we want this for other INT
like data types or does Drill handle that elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it does handle all widths already. If you look a full listing of MathFunc.tdd, rather than what this diff reveals, you'll see integer types from 8 bits to 64 bits.
@@ -71,22 +70,4 @@ public void eval() { | |||
} | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why is this being deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition of xor to MathFunc.tdd made it a redundant special case: 32-bit ints are already provided for by the new FreeMarker version. We get a small bit of clean-up for free here because the deleted IntXor function was out of place in BitFunctions.java, which otherwise only contains functions that operate on the boolean type BIT.
@dzamo Nice work. Could you please update the docs for this extension? Math and Trig |
Already done, Sir. Will deploy in the next batch of doc updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
Thanks for the contribution!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this got merged before I could do a requested review. Here is the review anyway.
I would recommend coming up with function names that can be typed as regular functions: XOR(A, B)
in addition to the hard-to-type \
|`(A, B)`. Check what other DBs do.
Then, I would recommend SQL-based tests that exercise all types. See detailed comment.
@@ -0,0 +1,34 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: this form of test was used very early on in Drill before the full SQL stack worked. Consider writing a SQL-based test.
Since we are trying to support all types, I wonder if we should have a test which generates the needed SQL. Something like:
SELECT CAST(A AS SMALLINT) XOR CAST(B AS SMALLINT)
FROM (VALUES ...)
Then generate the SQL for each supported data type and verify the results.
The CAST
is suggested because there is no good way in Drill to create a data file that has data of every types (since Drill has no way to specify a schema with data types.) I suppose you could create a Parquet file with the required types, but that seems overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @paul-rogers, I'll certain allow more time for reviews to arrive, in future. The naming of these functions put me into a bit of a spin, to the point where I couldn't formulate very good questions about naming when I asked on the Slack channel.
- I started out trying for names of
AND
,OR
andXOR
but the first two seem to be taken elsewhere Drill and would not work, while the last is fine and indeed was already synonymous with^
before this patch (but only working for 32-bit arguments). - Investigating
AND
andOR
, I foundbooleanAnd
andbooleanOr
in FunctionNames.java and then also logic in other code using these names to decide if something is a boolean expression, which bitwise logical operators are not. I decided at that point that I should not overload function names that were intended to be boolean, and abandonedAND
andOR
. - I considered
BIT_AND
,BIT_OR
andBIT_XOR
but already these belong to the aggregate bitwise functions and do not allow overloading for the scalar case (I got runtime errors when I tried). - I considered
INT_AND
,INT_OR
andINT_XOR
and indeed these did work in testing. - Somehow I uncovered that
&
,|
,^
are already working scalar bitwise functions in Drill, in spite of my being unable to find any implementation of them in the code. Looking at physical query plans did not give me any clues and I told myself that some "magic" is happening upstream, possibly code gen by Calcite? - Since Drill already has the names in (5) for bitwise functions, even if they're made of special characters and nasty, I stuck with them because it's consistent with what's already there.
I'd be grateful for any suggestion of where to take this in a follow-up commit...
…2264) * DRILL-7955: Add BIT_XOR aggregate function for bitwise XOR. * DRILL-7955: Support all integer types for the scalar XOR (^) function. * DRILL-7955: Add tests for BIT_XOR aggregate function. * DRILL-7955: Replace tabs with spaces.
DRILL-7955: Extend XOR bitwise functions to match AND and OR
Description
The bitwise XOR function (^) is only implemented for Int while AND (&) and OR (|) are implemented for all widths of integer. Also there is no BIT_XOR aggregate while there are both BIT_AND and BIT_OR and all three operations are associative. This improvement proposes to bring XOR in line with AND and OR in both of these respects. An example showing the state of the scalar XOR function without this patch follows.
Documentation
Descriptions of &, |, ^ and BIT_XOR will be added to the user docs.
Testing
Tests for LSHIFT, RSHIFT (two related and testless functions), XOR, BIT_XOR added to
exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestNewMathFunctions.java
exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestNewAggregateFunctions.java