Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add all_match Presto function #3356

Closed
wants to merge 15 commits into from

Conversation

duanmeng
Copy link
Collaborator

@duanmeng duanmeng commented Nov 24, 2022

all_match(array(T), function(T, boolean)) → boolean

Returns whether all elements of an array match the given predicate. Returns true if all the elements match the predicate (a special case is when the array is empty); false if one or more elements don’t match; NULL if the predicate function returns NULL for one or more elements and true for all other elements.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 24, 2022
@netlify
Copy link

netlify bot commented Nov 24, 2022

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 16a4a1c
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6414854e6f18f800089f697d

@isadikov
Copy link
Contributor

Hi @duanmeng. Are you still working on this function?

@duanmeng
Copy link
Collaborator Author

duanmeng commented Jan 27, 2023

Hi @duanmeng. Are you still working on this function?

@isadikov Yes, sorry for the delay. I recovered from the covid-19, and am on the lunar new year holidays this month. I will resolve the conflicts and comments if someone helps to review:), in 3/4 days.

Copy link
Contributor

@isadikov isadikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for opening the PR and working on the function! I left a few comments, I would appreciate it if you could take a look.

velox/functions/prestosql/ArrayAllMatch.cpp Show resolved Hide resolved
velox/functions/prestosql/ArrayAllMatch.cpp Outdated Show resolved Hide resolved
velox/functions/prestosql/tests/ArrayAllMatchTest.cpp Outdated Show resolved Hide resolved
velox/functions/prestosql/ArrayAllMatch.cpp Outdated Show resolved Hide resolved
@duanmeng duanmeng force-pushed the add_all_match branch 3 times, most recently from 336e301 to 587f8a1 Compare February 6, 2023 13:02
@duanmeng
Copy link
Collaborator Author

duanmeng commented Feb 7, 2023

@isadikov I've resolved all the comments, could you help to have a look?

Copy link
Contributor

@isadikov isadikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for working on this. I left a couple of nit comments.

@duanmeng
Copy link
Collaborator Author

@isadikov I've resolved the rest of nit comments, could you help to have a look?

@isadikov
Copy link
Contributor

Looks good, thanks for working on this!

@mbasmanova Could you take a look and review this PR that adds all_match function? Thank you!

@duanmeng
Copy link
Collaborator Author

@mbasmanova Could you please help to take a look, I've resolved all the comments, thanks.

@mbasmanova mbasmanova changed the title Add all_match Presto Functions Add all_match Presto function Feb 28, 2023
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duanmeng Looks good overall. Some comments below.

velox/docs/functions/presto/array.rst Show resolved Hide resolved
velox/functions/prestosql/ArrayAllMatch.cpp Outdated Show resolved Hide resolved
velox/functions/prestosql/ArrayAllMatch.cpp Outdated Show resolved Hide resolved
velox/functions/prestosql/ArrayAllMatch.cpp Outdated Show resolved Hide resolved
velox/functions/prestosql/ArrayAllMatch.cpp Outdated Show resolved Hide resolved
@duanmeng duanmeng force-pushed the add_all_match branch 5 times, most recently from edae9dd to 731f937 Compare March 4, 2023 17:11
@duanmeng
Copy link
Collaborator Author

duanmeng commented Mar 6, 2023

@mbasmanova I've resolved all the comments and added the error handling logic and related UT, could you help to take a look?

@duanmeng duanmeng force-pushed the add_all_match branch 2 times, most recently from 212f8fb to c394eb4 Compare March 8, 2023 04:53
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duanmeng Thank you for iterating on this PR. Looks good to me % a couple comments.

Returns true if all the elements match the predicate (a special case is when the array is empty);
Returns false if one or more elements don’t match;
Returns NULL if the predicate function returns NULL for one or more elements and true for all other elements.
Throw exceptions if the predicate function has errors on some elements and returns true or NULL on others.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typos: Throws an exception if the predicate fails for one or more elements and returns true or NULL for the rest.

auto allMatch = true;
auto hasNull = false;
auto hasError =
errors && row < errors->size() && !errors->isNullAt(row);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps, add a helper method to EvalCtx for readability:

auto hasError = context.hasError(row);

Also, this variable is not used until late. Let's move it next to first use.

if (hasError) {
auto err = std::static_pointer_cast<std::exception_ptr>(
context.errors()->valueAt(row));
std::rethrow_exception(*err);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make sure we throw VeloxException. See throwError in EvalCtx.cpp.

auto throwError(const std::exception_ptr& exceptionPtr) {
  std::rethrow_exception(toVeloxException(exceptionPtr));
}

Consider moving this logic into a helper method in EvalCtx to avoid copy-paste.

context.throwIfHasError(row);

exec::LocalDecodedVector bitsDecoder(context);
auto it = args[1]->asUnchecked<FunctionVector>()->iterator(&rows);

// Turns off the ThrowOnError flag and temporarily resets errors in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use active verbs: Turn off ... and reset

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

auto result = evaluate<SimpleVector<bool>>(
"all_match(c0, x -> ((10 / x) > 2))", makeRowVector({input}));

auto expectedResult = makeNullableFlatVector<bool>({false, false});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makeNullableFlatVector -> makeFlatVector

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}

TEST_F(ArrayAllMatchTest, errorReThrow) {
static constexpr std::string_view errorMessage{"division by zero"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming: kErrorMessage and kErrorCode

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

static constexpr std::string_view errorMessage{"division by zero"};
static constexpr std::string_view errorCode{"ARITHMETIC_ERROR"};

try {
Copy link
Contributor

@mbasmanova mbasmanova Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use VELOX_ASSERT_THROW

Let's also verify that these expressions work properly with TRY.

  • TRY(all_match(c0, x -> ))
  • all_match(c0, x -> TRY()))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

const BufferPtr& wrapCapture,
EvalCtx* context,
const std::vector<VectorPtr>& args,
VectorPtr& errors,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VectorPtr -> ErrorVectorPtr (sorry, I got it wrong in my earlier suggestion)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, done.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duanmeng Looks great % minor comments.

}

/// Extract the actual exception of the row
std::exception_ptr getError(vector_size_t row) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new methods seem unused. Let's remove.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

rows.end(),
std::move(allVectors));
EvalCtx lambdaCtx(context->execCtx(), context->exprSet(), row.get());
auto row = createRowVector(context, wrapCapture, args, rows.end());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactor.

transformErrorVector(lambdaCtx, context, rows, elementToTopLevelRows);
}

void applyNoThrowError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applyNoThrowError -> applyNoThrow for consistency (see EvalCtx::applyToSelectedNoThrow)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

auto allMatch = true;
auto hasNull = false;
std::exception_ptr errorPtr = nullptr;
auto hasError = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we don't need hasError variable. We can just check if errorPtr is null or not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

auto hasError = false;
for (auto i = 0; i < size; ++i) {
auto idx = offset + i;
if (elementErrors && idx < elementErrors->size() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider extracting a helper function for readability

if (hasError(elementErrors, idx))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

assertEqualVectors(expectedResult, result);
}

TEST_F(ArrayAllMatchTest, errorSuppress) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errorSuppress -> errors

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

assertEqualVectors(expectedResult, result);
}

TEST_F(ArrayAllMatchTest, errorReThrow) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe combine this test case with the previous one

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

kErrorMessage);
}

TEST_F(ArrayAllMatchTest, withTrys) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

makeRowVector({makeArrayVector<int8_t>({{1, 0}, {1}, {6}})}));
expectedResult = makeNullableFlatVector<bool>({std::nullopt, true, false});
assertEqualVectors(expectedResult, result);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a test case with multiple lambdas. See TransformTest.cpp:

// Test different lambdas applied to different rows
TEST_F(TransformTest, conditional) {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@duanmeng
Copy link
Collaborator Author

@mbasmanova I've resolved all comments, combined the error test cases, and added conditional multiple lambdas test cases, could you please help to have a look?

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duanmeng Looks great. Thank you for your patience.

velox/vector/FunctionVector.h Outdated Show resolved Hide resolved
kErrorMessage);
VELOX_ASSERT_THROW(
evaluate<SimpleVector<bool>>(
"all_match(c0, x -> ((10 / x) > 2))", makeRowVector({errorInput})),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider changing errorInput to include makeRowVector, so it can be used directly and drop template parameter for readability:

VELOX_ASSERT_THROW(
      evaluate("all_match(c0, x -> ((10 / x) > 2))", errorInput),

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

auto input = makeRowVector({makeNullableArrayVector<int8_t>(
{{0, 2, 0, 5, 0}, {5, std::nullopt, 0}})});
auto result =
evaluate<SimpleVector<bool>>("all_match(c0, x -> ((10 / x) > 2))", input);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can drop template parameters here and in other similar places as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@@ -15,14 +15,11 @@
*/
#pragma once

#include "velox/expression/EvalCtx.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this introduces circular dependency: vectors depend expressions and expressions depend on vectors. We need to figure out how to break that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move ErrorVectorPtr to FlatVector.h.

  using ErrorVector = FlatVector<std::shared_ptr<void>>;
  using ErrorVectorPtr = std::shared_ptr<ErrorVector>;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done. ErrorVectorPtr should be declared in the vectors scope.

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mbasmanova merged this pull request in d8c747f.

@mbasmanova
Copy link
Contributor

@duanmeng Now that the all_match Presto function is merged into Velox, it would be nice to follow up with an e2e test case in Prestissimo to ensure this function works correctly end-to-end: presto-native-execution/src/test/java/com/facebook/presto/nativeworker/TestHiveArrayFunctionQueries.java

ashokku2022 pushed a commit to ashokku2022/velox that referenced this pull request Mar 24, 2023
Summary:
all_match(array(T), function(T, boolean)) → boolean

>Returns whether all elements of an array match the given predicate. Returns true if all the elements match the predicate (a special case is when the array is empty); false if one or more elements don’t match; NULL if the predicate function returns NULL for one or more elements and true for all other elements.

Pull Request resolved: facebookincubator#3356

Reviewed By: Yuhta

Differential Revision: D44165931

Pulled By: mbasmanova

fbshipit-source-id: a23bc2619c3d2befcaf2aab9a373415ecc8f378a
marin-ma pushed a commit to marin-ma/velox-oap that referenced this pull request Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants