Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drill 7955: Extend XOR bitwise functions to match AND and OR #2264

Merged
merged 4 commits into from
Jun 25, 2021

Conversation

jnturton
Copy link
Contributor

@jnturton jnturton commented Jun 24, 2021

DRILL-7955: Extend XOR bitwise functions to match AND and OR

Description

The bitwise XOR function (^) is only implemented for Int while AND (&) and OR (|) are implemented for all widths of integer. Also there is no BIT_XOR aggregate while there are both BIT_AND and BIT_OR and all three operations are associative. This improvement proposes to bring XOR in line with AND and OR in both of these respects. An example showing the state of the scalar XOR function without this patch follows.

select `|`(cast(2 as bigint), cast(4 as bigint)) -- bitwise OR, works

EXPR$0|
------|
     6|

select `^`(cast(2 as bigint), cast(4 as bigint)) -- bitwise XOR, breaks

SQL Error: FUNCTION ERROR: ^ does not support operand types (BIGINT,BIGINT)

Documentation

Descriptions of &, |, ^ and BIT_XOR will be added to the user docs.

Testing

Tests for LSHIFT, RSHIFT (two related and testless functions), XOR, BIT_XOR added to

exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestNewMathFunctions.java
exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestNewAggregateFunctions.java

@jnturton
Copy link
Contributor Author

I was unsure about how to select a reviewer, apologies if I've spammed you.

Copy link
Contributor

@cgivre cgivre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments and questions but in general looks good.

@@ -138,7 +138,7 @@ unaryMathFunctions : [
{input: "UInt8", outputType: "UInt8", castType: "long"}
]
},
{className: "LeftShift", funcName: "lshift", javaFunc: " << ", types: [
{className: "LeftShift", funcName: "lshift", aliasName: "", javaFunc: " << ", types: [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want this for all INT types like SHORT, TINYINT etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment for div

@@ -125,7 +125,7 @@ unaryMathFunctions : [
{input: "UInt8", outputType: "UInt8", castType: "long"}
]
},
{className: "Mod", funcName: "mod", javaFunc : " % ", types: [
{className: "Mod", funcName: "mod", aliasName: "", javaFunc : " % ", types: [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as elsewhere... would we want this for other INT like data types or does Drill handle that elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it does handle all widths already. If you look a full listing of MathFunc.tdd, rather than what this diff reveals, you'll see integer types from 8 bits to 64 bits.

@@ -71,22 +70,4 @@ public void eval() {
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why is this being deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of xor to MathFunc.tdd made it a redundant special case: 32-bit ints are already provided for by the new FreeMarker version. We get a small bit of clean-up for free here because the deleted IntXor function was out of place in BitFunctions.java, which otherwise only contains functions that operate on the boolean type BIT.

@luocooong
Copy link
Member

@dzamo Nice work. Could you please update the docs for this extension? Math and Trig

@jnturton
Copy link
Contributor Author

@dzamo Nice work. Could you please update the docs for this extension? Math and Trig

Already done, Sir. Will deploy in the next batch of doc updates.

Copy link
Contributor

@cgivre cgivre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1
Thanks for the contribution!

@jnturton jnturton merged commit d088f47 into apache:master Jun 25, 2021
Copy link
Contributor

@paul-rogers paul-rogers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this got merged before I could do a requested review. Here is the review anyway.

I would recommend coming up with function names that can be typed as regular functions: XOR(A, B) in addition to the hard-to-type \|`(A, B)`. Check what other DBs do.

Then, I would recommend SQL-based tests that exercise all types. See detailed comment.

@@ -0,0 +1,34 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: this form of test was used very early on in Drill before the full SQL stack worked. Consider writing a SQL-based test.

Since we are trying to support all types, I wonder if we should have a test which generates the needed SQL. Something like:

SELECT CAST(A AS SMALLINT) XOR CAST(B AS SMALLINT)
FROM (VALUES ...)

Then generate the SQL for each supported data type and verify the results.

The CAST is suggested because there is no good way in Drill to create a data file that has data of every types (since Drill has no way to specify a schema with data types.) I suppose you could create a Parquet file with the required types, but that seems overkill.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @paul-rogers, I'll certain allow more time for reviews to arrive, in future. The naming of these functions put me into a bit of a spin, to the point where I couldn't formulate very good questions about naming when I asked on the Slack channel.

  1. I started out trying for names of AND, OR and XOR but the first two seem to be taken elsewhere Drill and would not work, while the last is fine and indeed was already synonymous with ^ before this patch (but only working for 32-bit arguments).
  2. Investigating AND and OR, I found booleanAnd and booleanOr in FunctionNames.java and then also logic in other code using these names to decide if something is a boolean expression, which bitwise logical operators are not. I decided at that point that I should not overload function names that were intended to be boolean, and abandoned AND and OR.
  3. I considered BIT_AND, BIT_OR and BIT_XOR but already these belong to the aggregate bitwise functions and do not allow overloading for the scalar case (I got runtime errors when I tried).
  4. I considered INT_AND, INT_OR and INT_XOR and indeed these did work in testing.
  5. Somehow I uncovered that &, |, ^ are already working scalar bitwise functions in Drill, in spite of my being unable to find any implementation of them in the code. Looking at physical query plans did not give me any clues and I told myself that some "magic" is happening upstream, possibly code gen by Calcite?
  6. Since Drill already has the names in (5) for bitwise functions, even if they're made of special characters and nasty, I stuck with them because it's consistent with what's already there.

I'd be grateful for any suggestion of where to take this in a follow-up commit...

Leon-WTF pushed a commit to Leon-WTF/drill that referenced this pull request Jul 6, 2021
…2264)

* DRILL-7955: Add BIT_XOR aggregate function for bitwise XOR.

* DRILL-7955: Support all integer types for the scalar XOR (^) function.

* DRILL-7955: Add tests for BIT_XOR aggregate function.

* DRILL-7955: Replace tabs with spaces.
@jnturton jnturton deleted the DRILL-7955 branch February 4, 2022 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants