Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add any_match and none_match Presto functions #4327

Closed
wants to merge 6 commits into from

Conversation

duanmeng
Copy link
Collaborator

@duanmeng duanmeng commented Mar 17, 2023

any_match(array(T), function(T, boolean)) → boolean#

Returns whether any elements of an array match the given predicate. Returns true if one or more elements match the predicate; false if none of the elements matches (a special case is when the array is empty); NULL if the predicate function returns NULL for one or more elements and false for all other elements.

none_match(array(T), function(T, boolean)) → boolean#

Returns whether no elements of an array match the given predicate. Returns true if none of the elements matches the predicate (a special case is when the array is empty); false if one or more elements match; NULL if the predicate function returns NULL for one or more elements and false for all other elements.

@netlify
Copy link

netlify bot commented Mar 17, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 650b047
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/641b32fcaef6740008fabc10

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 17, 2023
@duanmeng duanmeng marked this pull request as draft March 17, 2023 12:35
@duanmeng duanmeng removed the request for review from mbasmanova March 17, 2023 12:42
@duanmeng duanmeng self-assigned this Mar 17, 2023
@duanmeng duanmeng marked this pull request as ready for review March 21, 2023 03:52
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duanmeng Looks great % a couple of small comments.

velox/docs/functions/presto/array.rst Outdated Show resolved Hide resolved
@@ -11,6 +11,15 @@ Array Functions
Returns NULL if the predicate function returns NULL for one or more elements and true for all other elements.
Throws an exception if the predicate fails for one or more elements and returns true or NULL for the rest.

.. function:: any_match(array(T), function(T, boolean)) → boolean

Returns whether any elements of an array match the given predicate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "any elements" -> "at least one element"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


protected:
static FOLLY_ALWAYS_INLINE bool hasError(
const ErrorVectorPtr& elementErrors,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: elementErrors -> errors

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

const vector_size_t* sizes,
const ErrorVectorPtr& elementErrors,
const exec::LocalDecodedVector& bitsDecoder) const override {
auto size = sizes[row];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of this code repeats between the two functions. Also, virtual function call per row tends to be expensive. I wonder if we can reduce copy-paste by templating the implementation on a boolean. See https://github.com/facebookincubator/velox/compare/main...mbasmanova:velox-1:add_any_match?expand=1

@@ -11,6 +11,15 @@ Array Functions
Returns NULL if the predicate function returns NULL for one or more elements and true for all other elements.
Throws an exception if the predicate fails for one or more elements and returns true or NULL for the rest.

.. function:: any_match(array(T), function(T, boolean)) → boolean

Returns whether at least one element of an array match the given predicate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: match -> matches

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duanmeng Looks good to me.

void apply(
const SelectivityVector& rows,
std::vector<VectorPtr>& args,
const TypePtr& /*outputType*/,
const TypePtr& outputType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If outputType is not used, it should be commented out. Otherwise, linter will complain.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

.argumentType("array(T)")
.argumentType("function(T, boolean)")
.build()};
protected:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be private.

@@ -24,12 +24,13 @@
namespace facebook::velox::functions {
namespace {

class AllMatchFunction : public exec::VectorFunction {
public:
template <bool match>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment to explain the 'match' template parameter. I wonder if there is a more descriptive name for this parameter.

There is another function: none_match. Can we reuse this class to implement it? What would need to change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about use enum class as the template parameter?

enum class MatchMethod {
  ALL = 0,
  ANY = 1,
  NONE = 2
};

template <MatchMethod mathMethod>
class MatchFunction : public exec::VectorFunction {
    void applyInternal(...) const {
    bool match = true;
    if (mathMethod == MatchMethod::NONE) {
      // TODO: impl none_match
    } else {
      auto match = mathMethod == MatchMethod::ALL;
      // implementation of applyInternal in this pr
    } 

class AllMatchFunction : public MatchFunction<MatchMethod::ALL> {};
class AnyMatchFunction : public MatchFunction<MatchMethod::ANY> {};
class AllMatchFunction : public MatchFunction<MatchMethod::NONE> {}; // next pr


// Errors for individual array elements should be suppressed only if the
// outcome can be decided by some other array element, e.g. if there is
// another element that returns 'true' for the predicate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't match the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

{std::nullopt, std::nullopt}});
auto expectedResult =
makeNullableFlatVector<bool>({true, true, true, false, std::nullopt});
testExpr(expectedResult, "any_match(c0, x -> (x > 1))", input);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we shorten these tests by changing testExpr to take lambda expression and construct any_match(c0, x -> (<lambdaExpr>)):

testExpr(expectedResult, "x > 1", input);

Maybe shorten these tests further by passing std::vector<std::optional> as expected result:

  expectedResult = {true, false, false, false, true};
  testExpr(expectedResult, "x is null", input);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

makeNullableFlatVector<bool>({true, false, true, std::nullopt});
testExpr(expectedResult, "any_match(c0, x -> x)", input);

auto emptyInput = makeArrayVector<int32_t>({{}});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is already tested: L48

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

arrayOfArrays);
}

TEST_F(ArrayAnyMatchTest, bigints) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is already covered by 'basic' test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@@ -24,8 +24,11 @@
namespace facebook::velox::functions {
namespace {

class AllMatchFunction : public exec::VectorFunction {
public:
enum class MatchMethod { ALL = 0, ANY = 1, NONE = 2 };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duanmeng This is a reasonable idea, but how would we implement NONE case?

I'm seeing the the 3 cases need the following loops:

---- kAll ----


bool allMatch = true

loop:
   if not match:
    allMatch = false;
    break;


if (!allMatch) -> false
else if hasError -> error
else if hasNull -> null
else -> true


---- kAny ----

bool anyMatch = false

loop:
   if match:
    anyMatch = true;
    break;


if (anyMatch) -> true
else if hasError -> error
else if hasNull -> null
else -> false

---- kNone ----

bool noneMatch = true;

loop:
   if match:
    noneMatch = false;
    break;

if (!noneMatch) -> false
else if hasError -> error
else if hasNull -> null
else -> true

These loops can be implemented as a single loop with 2 parameters: bool initialValue, bool earlyReturn.

See main...mbasmanova:velox-1:add_any_match

I suggest we include both any_match and none_match functions in this PR to make sure the code is generic enough to handle both.

Copy link
Collaborator Author

@duanmeng duanmeng Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova IMHO AllMatch and NoneMatch may have the same initialValue and earlyReturn and it is not descriptive to use them as template parameters. Should we use the enum and handle them in initial, flip, and result finalization?
All, none, and any match have different and similar logic intertwined in terms of the initial value, flip condition, and the result finalization:

Initial value:

  • All is true
  • Any is false
  • None is true

Flip logic:

  • All flips once encounter an unmatched element
  • Any flips once encounter a matched element
  • None flips once encounter a matched element

Result finalization:

  • All: ignore the error and null if one or more elements are unmatched and return false
  • Any: ignore the error and null if one or more elements matched and return true
  • None: ignore the error and null if one or more elements matched and return false
auto match = (matchMethod != MatchMethod::kAny);
    auto hasNull = false;
    std::exception_ptr errorPtr{nullptr};
    for (auto i = 0; i < size; ++i) {
      auto idx = offset + i;
      if (hasError(elementErrors, idx)) {
        errorPtr = *std::static_pointer_cast<std::exception_ptr>(
            elementErrors->valueAt(idx));
        continue;
      }

      if (bitsDecoder->isNullAt(idx)) {
        hasNull = true;
      } else if (matchMethod == MatchMethod::kAll) {
        if (!bitsDecoder->valueAt<bool>(idx)) {
          match = !match;
          break;
        }
      } else {
        if (bitsDecoder->valueAt<bool>(idx)) {
          match = !match;
          break;
        }
      }
    }

    if ((matchMethod == MatchMethod::kAny) == match) {
      flatResult->set(row, match);
    } else if (errorPtr) {
      context.setError(row, errorPtr);
    } else if (hasNull) {
      flatResult->setNull(row, true);
    } else {
      flatResult->set(row, match);
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duanmeng Does the code I proposed not work? Would you add a test for none_match function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova It works, I misunderstood the earlyReturn, which is the early return condition, not value. It's also more descriptive of those three match functions. I've changed it to your proposed codes. Thanks:)

};

TEST_F(ArrayAnyMatchTest, basic) {
auto input = makeNullableArrayVector<int64_t>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for refactoring the tests. Much more readable. Should we go a bit further still?

TEST_F(ArrayAnyMatchTest, basic) {
  auto input = makeNullableArrayVector<int64_t>({
      {std::nullopt, 2, 0},
      {-1, 3},
      {-2, -3},
      {},
      {0, std::nullopt},
  });

  testAnyMatch("x > 1", {true, true, false, false, std::nullopt}, input);

  testAnyMatch("x is null", {true, false, false, false, true}, input);

  input = makeNullableArrayVector<bool>(
      {{false, true},
       {false, false},
       {std::nullopt, true},
       {std::nullopt, false}});
  testAnyMatch("x", {true, false, true, std::nullopt}, input);
}

@duanmeng duanmeng force-pushed the add_any_match branch 3 times, most recently from 864642b to 6a3eac1 Compare March 22, 2023 08:25
@@ -11,6 +11,24 @@ Array Functions
Returns NULL if the predicate function returns NULL for one or more elements and true for all other elements.
Throws an exception if the predicate fails for one or more elements and returns true or NULL for the rest.

.. function:: any_match(array(T), function(T, boolean)) → boolean

Returns whether at least one element of an array matchs the given predicate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: matchs -> matches

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

class AnyMatchFunction : public MatchFunction<MatchMethod::kAny> {};
class NoneMatchFunction : public MatchFunction<MatchMethod::kNone> {};

// TODO: add class NoneMatchFunction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the NoneMatchFunction and its test.

@@ -24,8 +24,11 @@
namespace facebook::velox::functions {
namespace {

class AllMatchFunction : public exec::VectorFunction {
public:
enum class MatchMethod { ALL = 0, ANY = 1, NONE = 2 };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duanmeng Does the code I proposed not work? Would you add a test for none_match function?

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duanmeng Looks great % a few small comments. Let's update PR title and description to include none_match function.

static FOLLY_ALWAYS_INLINE bool hasError(
const ErrorVectorPtr& errors,
int idx) {
return errors && idx < errors->size() && !errors->isNullAt(idx);
}

private:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "private:" is not needed as there is already one above it.

}

if (result != initialValue) {
flatResult->set(row, !initialValue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps, it would be more readable to do

flatResult->set(row, result);

} else if (hasNull) {
flatResult->setNull(row, true);
} else {
flatResult->set(row, initialValue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and

flatResult->set(row, result);

* limitations under the License.
*/

#include <folly/Format.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this and use fmt::format instead.

class ArrayAnyMatchTest : public functions::test::FunctionBaseTest {
protected:
// Evaluate an expression.
void testExpr(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: testAnyMatch to distinguish from testExpr

const std::vector<std::optional<bool>>& expected,
const std::string& lambdaExpr,
const std::vector<std::vector<std::optional<T>>>& input) {
auto expression = folly::sformat("any_match(c0, x -> ({}))", lambdaExpr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps, testAnyMatch(expected, lambdaExpr, makeNullableArrayVector(input)) to reduce code duplication

};

TEST_F(ArrayAnyMatchTest, basic) {
std::vector<std::vector<std::optional<int32_t>>> ints{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comma after last value in the vector for readability. This way each array value will appear on a separate line making it easier to understand.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks.

auto expression = "(10 / x) > 2";
std::vector<std::vector<std::optional<int8_t>>> input{
{0, 2, 0, 5, 0}, {2, 5, std::nullopt, 0}};
std::vector<std::optional<bool>> expectedResult = {true, true};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps,

testAnyMatch({true, true}, "(10 / x) > 2", input);

* limitations under the License.
*/

#include <folly/Format.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments as for ArrayAnyMatchTest apply here

Also, let's refactor ArrayAllMatchTest in the same way to ensure all these are consistent.

auto size = sizes[row];
auto offset = offsets[row];

// All, none, and any match have different and similar logic intertwined in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this comment is easy to understand.

Would it be helpful to show pseudo-code for different loops along with a generic loop?

all_match, none_match and any_match need to loop over predicate results for element arrays and check for results, nulls and errors. These loops can be generalized using two booleans.

Here is what the individual loops look like.

---- kAll ----
bool allMatch = true

loop:
   if not match:
    allMatch = false;
    break;

if (!allMatch) -> false
else if hasError -> error
else if hasNull -> null
else -> true

---- kAny ----

bool anyMatch = false

loop:
   if match:
    anyMatch = true;
    break;

if (anyMatch) -> true
else if hasError -> error
else if hasNull -> null
else -> false

---- kNone ----

bool noneMatch = true;

loop:
   if match:
    noneMatch = false;
    break;

if (!noneMatch) -> false
else if hasError -> error
else if hasNull -> null
else -> true

To generalize these loops, we use initialValue and earlyReturn booleans like so:

--- generic loop ---

bool result = initialValue

loop:
   if match == earlyReturn:
    result = false;
    break;

if (result != initialValue) -> result
else if hasError -> error
else if hasNull -> null
else -> result

@duanmeng
Copy link
Collaborator Author

@mbasmanova I've resolved all the comments and refactored AllMatchFunctionTest, could you help to take a look?

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duanmeng Looks great. Let's update PR title and description to add none_match function.

class ArrayAllMatchTest : public functions::test::FunctionBaseTest {};
class ArrayAllMatchTest : public functions::test::FunctionBaseTest {
protected:
// Evaluate an expression.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This comment is redundant. Let's remove.


class ArrayAnyMatchTest : public functions::test::FunctionBaseTest {
protected:
// Evaluate an expression.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same


class ArrayNoneMatchTest : public functions::test::FunctionBaseTest {
protected:
// Evaluate an expression.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@duanmeng duanmeng changed the title Add any_match Presto function Add any_match, none_match Presto function Mar 22, 2023
@duanmeng duanmeng changed the title Add any_match, none_match Presto function Add any_match, none_match Presto functions Mar 22, 2023
@mbasmanova mbasmanova changed the title Add any_match, none_match Presto functions Add any_match and none_match Presto functions Mar 22, 2023
@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mbasmanova merged this pull request in e7d6a1b.

ashokku2022 pushed a commit to ashokku2022/velox that referenced this pull request Mar 24, 2023
Summary:
any_match(array(T), function(T, boolean)) → boolean[#](https://prestodb.io/docs/current/functions/array.html#any_match)
>Returns whether any elements of an array match the given predicate. Returns true if one or more elements match the predicate; false if none of the elements matches (a special case is when the array is empty); NULL if the predicate function returns NULL for one or more elements and false for all other elements.

none_match(array(T), function(T, boolean)) → boolean[#](https://prestodb.io/docs/current/functions/array.html#none_match)
>Returns whether no elements of an array match the given predicate. Returns true if none of the elements matches the predicate (a special case is when the array is empty); false if one or more elements match; NULL if the predicate function returns NULL for one or more elements and false for all other elements.

Pull Request resolved: facebookincubator#4327

Reviewed By: Yuhta

Differential Revision: D44255945

Pulled By: mbasmanova

fbshipit-source-id: f3797a83723badc33a644de7b76baae3e54df3cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants