Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drill 7955: Extend XOR bitwise functions to match AND and OR #2264

Merged
merged 4 commits into from
Jun 25, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions exec/java-exec/src/main/codegen/data/AggrBitwiseLogicalTypes.tdd
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,25 @@
{inputType: "UInt8", outputType: "NullableUInt8", maxval: "Long"},
{inputType: "NullableUInt8", outputType: "NullableUInt8", maxval: "Long"}
]
},
{className: "BitwiseXor", funcName: "bit_xor", aliasName: "", types: [
{inputType: "BigInt", outputType: "NullableBigInt", maxval: "Integer"},
{inputType: "NullableBigInt", outputType: "NullableBigInt", maxval: "Integer"},
{inputType: "Int", outputType: "NullableInt", maxval: "Integer"},
{inputType: "NullableInt", outputType: "NullableInt", maxval: "Integer"},
{inputType: "SmallInt", outputType: "NullableSmallInt", maxval: "Byte", extraCast: "short"},
{inputType: "NullableSmallInt", outputType: "NullableSmallInt", maxval: "Byte", extraCast: "short"},
{inputType: "TinyInt", outputType: "NullableTinyInt", maxval: "Byte", extraCast: "byte"},
{inputType: "NullableTinyInt", outputType: "NullableTinyInt", maxval: "Byte", extraCast: "byte"},
{inputType: "UInt1", outputType: "NullableInt", maxval: "Byte"},
{inputType: "NullableUInt1", outputType: "NullableInt", maxval: "Byte"},
{inputType: "UInt2", outputType: "NullableInt", maxval: "Character"},
{inputType: "NullableUInt2", outputType: "NullableInt", maxval: "Character"},
{inputType: "UInt4", outputType: "NullableInt", maxval: "Integer"},
{inputType: "NullableUInt4", outputType: "NullableInt", maxval: "Integer"},
{inputType: "UInt8", outputType: "NullableUInt8", maxval: "Long"},
{inputType: "NullableUInt8", outputType: "NullableUInt8", maxval: "Long"}
]
}
]
}
21 changes: 16 additions & 5 deletions exec/java-exec/src/main/codegen/data/MathFunc.tdd
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ unaryMathFunctions : [
}
],
binaryMathFunctions : [
{className: "Div", funcName: "div", javaFunc : " / ", types: [
{className: "Div", funcName: "div", aliasName: "", javaFunc : " / ", types: [
{input: "Int", outputType: "Int", castType: "int"},
{input: "BigInt", outputType: "BigInt", castType: "long"},
{input: "Float4", outputType: "Float4", castType: "float", roundingRequired: "true"},
Expand All @@ -125,7 +125,7 @@ unaryMathFunctions : [
{input: "UInt8", outputType: "UInt8", castType: "long"}
]
},
{className: "Mod", funcName: "mod", javaFunc : " % ", types: [
{className: "Mod", funcName: "mod", aliasName: "", javaFunc : " % ", types: [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as elsewhere... would we want this for other INT like data types or does Drill handle that elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it does handle all widths already. If you look a full listing of MathFunc.tdd, rather than what this diff reveals, you'll see integer types from 8 bits to 64 bits.

{input: "Int", outputType: "Int", castType: "int"},
{input: "BigInt", outputType: "BigInt", castType: "long"},
{input: "Float4", outputType: "Float4", castType: "float"},
Expand All @@ -138,7 +138,7 @@ unaryMathFunctions : [
{input: "UInt8", outputType: "UInt8", castType: "long"}
]
},
{className: "LeftShift", funcName: "lshift", javaFunc: " << ", types: [
{className: "LeftShift", funcName: "lshift", aliasName: "", javaFunc: " << ", types: [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want this for all INT types like SHORT, TINYINT etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment for div

{input: "Int", outputType: "Int", castType: "int"},
{input: "BigInt", outputType: "BigInt", castType: "long"},
{input: "SmallInt", outputType: "SmallInt", castType: "short"},
Expand All @@ -149,7 +149,18 @@ unaryMathFunctions : [
{input: "UInt8", outputType: "UInt8", castType: "long"}
]
},
{className: "RightShift", funcName: "rshift", javaFunc: " >> ", types: [
{className: "RightShift", funcName: "rshift", aliasName: "", javaFunc: " >> ", types: [
{input: "Int", outputType: "Int", castType: "int"},
{input: "BigInt", outputType: "BigInt", castType: "long"},
{input: "SmallInt", outputType: "SmallInt", castType: "short"},
{input: "TinyInt", outputType: "TinyInt", castType: "byte"},
{input: "UInt1", outputType: "UInt1", castType: "byte"},
{input: "UInt2", outputType: "UInt2", castType: "char"},
{input: "UInt4", outputType: "UInt4", castType: "int"},
{input: "UInt8", outputType: "UInt8", castType: "long"}
]
},
{className: "Xor", funcName: "xor", aliasName: "^", javaFunc: " ^ ", types: [
{input: "Int", outputType: "Int", castType: "int"},
{input: "BigInt", outputType: "BigInt", castType: "long"},
{input: "SmallInt", outputType: "SmallInt", castType: "short"},
Expand Down Expand Up @@ -423,4 +434,4 @@ trigoMathFunctions : [
]
}
]
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,8 @@ public void setup() {
inter.value = ${type.maxval}.MAX_VALUE;
<#elseif aggrtype.funcName == "bit_or">
inter.value = 0;
<#elseif aggrtype.funcName == "bit_xor">
inter.value = 0;
</#if>
}

Expand All @@ -102,6 +104,8 @@ public void add() {
inter.value = <#if type.extraCast ??>(${type.extraCast})</#if>(inter.value & in.value);
<#elseif aggrtype.funcName == "bit_or">
inter.value = <#if type.extraCast ??>(${type.extraCast})</#if>(inter.value | in.value);
<#elseif aggrtype.funcName == "bit_xor">
inter.value = <#if type.extraCast ??>(${type.extraCast})</#if>(inter.value ^ in.value);
</#if>

<#if type.inputType?starts_with("Nullable")>
Expand All @@ -117,6 +121,8 @@ public void output() {
out.value = inter.value;
<#elseif aggrtype.funcName == "bit_or">
out.value = inter.value;
<#elseif aggrtype.funcName == "bit_xor">
out.value = inter.value;
</#if>
} else {
out.isSet = 0;
Expand All @@ -130,6 +136,8 @@ public void reset() {
inter.value = ${type.maxval}.MAX_VALUE;
<#elseif aggrtype.funcName == "bit_or">
inter.value = 0;
<#elseif aggrtype.funcName == "bit_xor">
inter.value = 0;
</#if>
}
}
Expand Down
6 changes: 5 additions & 1 deletion exec/java-exec/src/main/codegen/templates/MathFunctions.java
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,11 @@ public void eval() {
<#list mathFunc.binaryMathFunctions as func>
<#list func.types as type>

<#if func.aliasName == "">
@FunctionTemplate(name = "${func.funcName}", scope = FunctionScope.SIMPLE, nulls = NullHandling.NULL_IF_NULL)
<#else>
@FunctionTemplate(names = {"${func.funcName}", "${func.aliasName}"}, scope = FunctionScope.SIMPLE, nulls = NullHandling.NULL_IF_NULL)
</#if>
public static class ${func.className}${type.input} implements DrillSimpleFunc {

@Param ${type.input}Holder input1;
Expand Down Expand Up @@ -321,4 +325,4 @@ public void eval() {
}
</#list>
</#list>
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@
import org.apache.drill.exec.expr.annotations.Output;
import org.apache.drill.exec.expr.annotations.Param;
import org.apache.drill.exec.expr.holders.BitHolder;
import org.apache.drill.exec.expr.holders.IntHolder;

/**
* Function templates for Bit/BOOLEAN functions other than comparison
Expand Down Expand Up @@ -71,22 +70,4 @@ public void eval() {
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why is this being deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of xor to MathFunc.tdd made it a redundant special case: 32-bit ints are already provided for by the new FreeMarker version. We get a small bit of clean-up for free here because the deleted IntXor function was out of place in BitFunctions.java, which otherwise only contains functions that operate on the boolean type BIT.


@FunctionTemplate(names = {FunctionNames.XOR, "^"},
scope = FunctionScope.SIMPLE,
nulls = NullHandling.NULL_IF_NULL)
public static class IntXor implements DrillSimpleFunc {

@Param IntHolder left;
@Param IntHolder right;
@Output IntHolder out;

@Override
public void setup() {}

@Override
public void eval() {
out.value = left.value ^ right.value;
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ public void runTest(String physicalPlan, String inputDataFile,
public void testBitwiseAggrFuncs() throws Exception {
String physicalPlan = "/functions/test_logical_aggr.json";
String inputDataFile = "/logical_aggr_input.json";
Object[] expected = {0L, 4L, 4L, 7L, -2L, 1L, true, false};
Object[] expected = {0L, 4L, 4L, 7L, -2L, 1L, 3L, 4L, 0L, true, false};

runTest(physicalPlan, inputDataFile, expected);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -152,11 +152,18 @@ public void testTruncDivMod() throws Throwable{
runTest(expected, "functions/testDivModTruncFunctions.json");
}

@Test
public void testIsNumeric() throws Throwable{
final Object [] expected = new Object[] {1, 1, 1, 0};
runTest(expected, "functions/testIsNumericFunction.json");
}
@Test
public void testBitTwiddlers() throws Throwable {
final Object [] expected = new Object[] { 3072, 3, 14 };
// Note bitwise AND and OR do not have Drill function implementations
runTest(expected, "functions/testBitTwiddlers.json");
}

@Test
public void testIsNumeric() throws Throwable{
final Object [] expected = new Object[] {1, 1, 1, 0};
runTest(expected, "functions/testIsNumericFunction.json");
}

@Test
public void testLog10WithDouble() throws Throwable {
Expand Down
34 changes: 34 additions & 0 deletions exec/java-exec/src/test/resources/functions/testBitTwiddlers.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: this form of test was used very early on in Drill before the full SQL stack worked. Consider writing a SQL-based test.

Since we are trying to support all types, I wonder if we should have a test which generates the needed SQL. Something like:

SELECT CAST(A AS SMALLINT) XOR CAST(B AS SMALLINT)
FROM (VALUES ...)

Then generate the SQL for each supported data type and verify the results.

The CAST is suggested because there is no good way in Drill to create a data file that has data of every types (since Drill has no way to specify a schema with data types.) I suppose you could create a Parquet file with the required types, but that seems overkill.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @paul-rogers, I'll certain allow more time for reviews to arrive, in future. The naming of these functions put me into a bit of a spin, to the point where I couldn't formulate very good questions about naming when I asked on the Slack channel.

  1. I started out trying for names of AND, OR and XOR but the first two seem to be taken elsewhere Drill and would not work, while the last is fine and indeed was already synonymous with ^ before this patch (but only working for 32-bit arguments).
  2. Investigating AND and OR, I found booleanAnd and booleanOr in FunctionNames.java and then also logic in other code using these names to decide if something is a boolean expression, which bitwise logical operators are not. I decided at that point that I should not overload function names that were intended to be boolean, and abandoned AND and OR.
  3. I considered BIT_AND, BIT_OR and BIT_XOR but already these belong to the aggregate bitwise functions and do not allow overloading for the scalar case (I got runtime errors when I tried).
  4. I considered INT_AND, INT_OR and INT_XOR and indeed these did work in testing.
  5. Somehow I uncovered that &, |, ^ are already working scalar bitwise functions in Drill, in spite of my being unable to find any implementation of them in the code. Looking at physical query plans did not give me any clues and I told myself that some "magic" is happening upstream, possibly code gen by Calcite?
  6. Since Drill already has the names in (5) for bitwise functions, even if they're made of special characters and nasty, I stuck with them because it's consistent with what's already there.

I'd be grateful for any suggestion of where to take this in a follow-up commit...

head : {
version : 1,
generator : {
type : "optiq",
info : "na"
},
type : "APACHE_DRILL_PHYSICAL"
},
graph:[
{
@id:1,
pop:"mock-sub-scan",
url: "http://apache.org",
entries:[
{records: 1, types: [
{name: "blue", type: "BIGINT", mode: "REQUIRED"}
]}
]
}, {
pop : "project",
@id : 2,
exprs : [
{ ref : "ref1", expr : " lshift(3,10) "},
{ ref : "ref2", expr : " rshift(3072, 10) "},
{ ref : "ref3", expr : " xor(7, 9) " }
],
child : 1
}, {
pop : "screen",
@id : 3,
child : 2
} ]
}
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,18 @@
"expr" : "bit_or(`D`) "
}, {
"ref" : "`EXPR$6`",
"expr" : "bool_or(`C`) "
"expr" : "bit_xor(`A`) "
}, {
"ref" : "`EXPR$7`",
"expr" : "bit_xor(`B`) "
}, {
"ref" : "`EXPR$8`",
"expr" : "bit_xor(`D`) "
}, {
"ref" : "`EXPR$9`",
"expr" : "bool_or(`C`) "
}, {
"ref" : "`EXPR$10`",
"expr" : "bool_and(`C`) "
} ]
}, {
Expand Down
2 changes: 1 addition & 1 deletion exec/java-exec/src/test/resources/logical_aggr_input.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{"A" : 7, "B" : 2, "C" : true, "D" : 1}
{"A" : 4, "B" : -2, "C" : false, "D" : 0}
{"A" : 4, "B" : 4, "C" : false, "D" : 0}
{"A" : 4, "B" : -4, "C" : true, "D" : 1}
{"A" : 4, "B" : -4, "C" : true, "D" : 1}