Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Malformed array vector result from zip() in simplified path #3542

Closed
kagamiori opened this issue Dec 17, 2022 · 3 comments
Closed

Malformed array vector result from zip() in simplified path #3542

kagamiori opened this issue Dec 17, 2022 · 3 comments
Assignees
Labels
bug Something isn't working fuzzer-found

Comments

@kagamiori
Copy link
Contributor

Description

This is an error found by fuzzer when comparing the results from the common and simplified path. Results are array vectors and the result from the simplified path has rawOffsets_[i] exceeding the length of element vector.

Error Reproduction

Run the following command with the attached files: velox/expression/tests/velox_expression_runner_test --fuzzer_repro_path /path/to/velox_expressionVerifier_8R09KJ --mode verify

velox_expressionVerifier_8R09KJ.zip

Relevant logs

I1216 17:29:26.831337 4005815 ExpressionFuzzer.cpp:793] ==============================> Started iteration 0 (seed: 4187054583)
I1216 17:29:26.834861 4005815 ExpressionVerifier.cpp:80] Executing expression: zip("c0","c1")
I1216 17:29:26.835045 4005815 ExpressionVerifier.cpp:32] 2 vectors as input:
I1216 17:29:26.835119 4005815 ExpressionVerifier.cpp:34] 	[DICTIONARY ARRAY<BOOLEAN>: 100 elements, 8 nulls], [DICTIONARY ARRAY<BOOLEAN>: 100 elements, 13 nulls], [DICTIONARY ARRAY<BOOLEAN>: 100 elements, 7 nulls], [ARRAY ARRAY<BOOLEAN>: 100 elements, no nulls]
I1216 17:29:26.835273 4005815 ExpressionVerifier.cpp:34] 	[DICTIONARY ARRAY<DATE>: 100 elements, 14 nulls], [DICTIONARY ARRAY<DATE>: 100 elements, 11 nulls], [DICTIONARY ARRAY<DATE>: 100 elements, 9 nulls], [ARRAY ARRAY<DATE>: 100 elements, no nulls]
E1216 17:29:26.841002 4005815 Exceptions.h:68] Line: buck-out/dev/gen/aab7ed39/velox/vector/velox_vector#header-mode-symlink-tree-with-header-map,headers/velox/vector/BaseVector.h:139, Function:isNullAt, Expression: isIndexInRange(idx) , Source: RUNTIME, ErrorCode: INVALID_STATE
terminate called after throwing an instance of 'facebook::velox::VeloxRuntimeError'
  what():  Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Retriable: False
Expression: isIndexInRange(idx)
Function: isNullAt
File: buck-out/dev/gen/aab7ed39/velox/vector/velox_vector#header-mode-symlink-tree-with-header-map,headers/velox/vector/BaseVector.h
Line: 139
Stack trace:
# 0  std::shared_ptr<facebook::velox::VeloxException::State const> facebook::velox::VeloxException::State::make<facebook::velox::VeloxException::make(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)::$_0>(facebook::velox::VeloxException::Type, facebook::velox::VeloxException::make(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)::$_0)
# 1  facebook::velox::VeloxException::VeloxException(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)
# 2  facebook::velox::VeloxRuntimeError::VeloxRuntimeError(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, std::basic_string_view<char, std::char_traits<char> >)
# 3  void facebook::velox::detail::veloxCheckFail<facebook::velox::VeloxRuntimeError, facebook::velox::detail::CompileTimeEmptyString>(facebook::velox::detail::VeloxCheckFailArgs const&, facebook::velox::detail::CompileTimeEmptyString)
# 4  facebook::velox::BaseVector::isNullAt(int) const
# 5  facebook::velox::RowVector::compare(facebook::velox::BaseVector const*, int, int, facebook::velox::CompareFlags) const
# 6  facebook::velox::(anonymous namespace)::compareArrays(facebook::velox::BaseVector const&, facebook::velox::BaseVector const&, facebook::velox::(anonymous namespace)::IndexRange, facebook::velox::(anonymous namespace)::IndexRange, facebook::velox::CompareFlags)
# 7  facebook::velox::ArrayVector::compare(facebook::velox::BaseVector const*, int, int, facebook::velox::CompareFlags) const
# 8  facebook::velox::BaseVector::equalValueAt(facebook::velox::BaseVector const*, int, int) const
# 9  facebook::velox::test::(anonymous namespace)::compareVectors(std::shared_ptr<facebook::velox::BaseVector> const&, std::shared_ptr<facebook::velox::BaseVector> const&)
# 10 facebook::velox::test::ExpressionVerifier::verify(std::shared_ptr<facebook::velox::core::ITypedExpr const> const&, std::shared_ptr<facebook::velox::RowVector> const&, std::shared_ptr<facebook::velox::BaseVector>&&, bool, std::vector<unsigned int, std::allocator<unsigned int> >)
# 11 facebook::velox::test::ExpressionFuzzer::go()
# 12 facebook::velox::test::expressionFuzzer(std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<facebook::velox::exec::FunctionSignature const*, std::allocator<facebook::velox::exec::FunctionSignature const*> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::vector<facebook::velox::exec::FunctionSignature const*, std::allocator<facebook::velox::exec::FunctionSignature const*> > > > >, unsigned long)
# 13 FuzzerRunner::run(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
# 14 main
# 15 __libc_start_call_main
# 16 __libc_start_main_alias_2
# 17 _start

*** Aborted at 1671240566 (Unix time, try 'date -d @1671240566') ***
*** Signal 6 (SIGABRT) (0x37b3c003d1fb7) received by PID 4005815 (pthread TID 0x7f19d7f99500) (linux TID 4005815) (maybe from PID 4005815, UID 228156) (code: -6), stack trace: ***
    @ 0000000000010fae folly::symbolizer::(anonymous namespace)::innerSignalHandler(int, siginfo_t*, void*)
                       ./folly/experimental/symbolizer/SignalHandler.cpp:449
    @ 000000000000f6d1 folly::symbolizer::(anonymous namespace)::signalHandler(int, siginfo_t*, void*)
                       ./folly/experimental/symbolizer/SignalHandler.cpp:470
    @ 000000000004459f (unknown)
    @ 000000000009c9d3 __GI___pthread_kill
    @ 00000000000444ec __GI_raise
    @ 000000000002c432 __GI_abort
    @ 00000000000a3fd4 __gnu_cxx::__verbose_terminate_handler()
    @ 00000000000a1b39 __cxxabiv1::__terminate(void (*)())
    @ 00000000000a1ba4 std::terminate()
    @ 00000000000a1ec1 __cxa_rethrow
    @ 0000000000022854 facebook::velox::test::ExpressionVerifier::verify(std::shared_ptr<facebook::velox::core::ITypedExpr const> const&, std::shared_ptr<facebook::velox::RowVector> const&, std::shared_ptr<facebook::velox::BaseVector>&&, bool, std::vector<unsigned int, std::allocator<unsigned int> >)
                       ./velox/expression/tests/ExpressionVerifier.cpp:190
    @ 0000000000086d67 facebook::velox::test::ExpressionFuzzer::go()
                       ./velox/expression/tests/ExpressionFuzzer.cpp:836
    @ 00000000000a55bd facebook::velox::test::expressionFuzzer(std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<facebook::velox::exec::FunctionSignature const*, std::allocator<facebook::velox::exec::FunctionSignature const*> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::vector<facebook::velox::exec::FunctionSignature const*, std::allocator<facebook::velox::exec::FunctionSignature const*> > > > >, unsigned long)
                       ./velox/expression/tests/ExpressionFuzzer.cpp:865
    @ 00000000002e436f FuzzerRunner::run(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
                       ./velox/expression/tests/FuzzerRunner.h:140
                       -> ./velox/expression/tests/ExpressionFuzzerTest.cpp
    @ 00000000002e3b47 main
                       ./velox/expression/tests/ExpressionFuzzerTest.cpp:67
    @ 000000000002c656 __libc_start_call_main
    @ 000000000002c717 __libc_start_main_alias_2
    @ 00000000002c3320 _start
                       /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/x86_64/start.S:116
Aborted (core dumped)
@kagamiori kagamiori added bug Something isn't working fuzzer Issues related the to Velox fuzzer test components. fuzzer-found labels Dec 17, 2022
@kagamiori kagamiori self-assigned this Dec 20, 2022
@mbasmanova mbasmanova removed the fuzzer Issues related the to Velox fuzzer test components. label Dec 20, 2022
@kagamiori
Copy link
Contributor Author

The bug is in the fast path of the zip(array(U), array(V)) -> array(row(U, V)) function.

if (allSameSize) {
// This is true if all input vectors have the "flat" Array encoding.
bool allFlat = true;
for (const auto& arg : args) {
allFlat &= arg->encoding() == VectorEncoding::Simple::ARRAY;
}
if (allFlat) {
// Fast path if all input Vectors are flat and for all rows, all arrays
// within a row are the same size. In this case we don't have to add
// nulls, or decode the arrays, we can just pass in the element Vectors
// as is to be the fields of the output Rows.
std::vector<VectorPtr> elements;
elements.reserve(args.size());
for (const auto& arg : args) {
elements.push_back(arg->as<ArrayVector>()->elements());
}
auto rowType = outputType->childAt(0);
auto rowVector = std::make_shared<RowVector>(
pool,
rowType,
BufferPtr(nullptr),
resultElementsSize,
std::move(elements));
// Now convert these to an Array
auto arrayVector = std::make_shared<ArrayVector>(
pool,
outputType,
BufferPtr(nullptr),
rows.end(),
baseVectors[0]->offsets(),
resultArraySizesBuffer,
std::move(rowVector));
context.moveOrCopyResult(arrayVector, rows, result);
return;
}
}

This fast path creates a row vector of the size equals to the sum of input array sizes, and then create a result array vector on top of this row vector (as the elements vector) using the original offsets() buffer of the first input array vector. There are at least two problems: 1. the sum of input array sizes doesn't necessarily equal to the element vector sizes of the input arrays, and 2. the first input array's offsets() buffer doesn't necessarily align with the second input array's elements.

@kagamiori kagamiori assigned kevinwilfong and unassigned kagamiori Dec 21, 2022
kevinwilfong pushed a commit to kevinwilfong/velox that referenced this issue Dec 21, 2022
Summary:
The fuzzer discovered an issue with the Zip UDF
facebookincubator#3542

I hadn't accounted for the fact that just because all of the arrays are flat and the same size, doesn't mean they all have the same offsets into their elements Vectors.

E.g. with 3 arrays of size 3, the offsets could be 0, 3, 6 or 6, 3, 0 or 0, 4, 7 (if there are unused elements)

When this is the case the fast path added to the Zip UDF produces incorrect results.

The fix is just to validate the offsets are the same in addition to the sizes.

Differential Revision: D42183237

fbshipit-source-id: 14b312ac3d8541ae531f440b123f13eb81d92f08
kevinwilfong pushed a commit to kevinwilfong/velox that referenced this issue Dec 21, 2022
…cubator#3564)

Summary:
Pull Request resolved: facebookincubator#3564

The fuzzer discovered an issue with the Zip UDF
facebookincubator#3542

I hadn't accounted for the fact that just because all of the arrays are flat and the same size, doesn't mean they all have the same offsets into their elements Vectors.

E.g. with 3 arrays of size 3, the offsets could be 0, 3, 6 or 6, 3, 0 or 0, 4, 7 (if there are unused elements)

When this is the case the fast path added to the Zip UDF produces incorrect results.

The fix is just to validate the offsets are the same in addition to the sizes.

Reviewed By: kagamiori

Differential Revision: D42183237

fbshipit-source-id: 3093ab55d8c848e411772fd3bfdca32b38ce6671
facebook-github-bot pushed a commit that referenced this issue Dec 21, 2022
Summary:
Pull Request resolved: #3564

The fuzzer discovered an issue with the Zip UDF
#3542

I hadn't accounted for the fact that just because all of the arrays are flat and the same size, doesn't mean they all have the same offsets into their elements Vectors.

E.g. with 3 arrays of size 3, the offsets could be 0, 3, 6 or 6, 3, 0 or 0, 4, 7 (if there are unused elements)

When this is the case the fast path added to the Zip UDF produces incorrect results.

The fix is just to validate the offsets are the same in addition to the sizes.

Reviewed By: kagamiori

Differential Revision: D42183237

fbshipit-source-id: 5aa720d9fb10eaa777f186dcc08d056e639a8ee6
@mbasmanova
Copy link
Contributor

@kagamiori Is this fixed now?

@kagamiori
Copy link
Contributor Author

@kagamiori Is this fixed now?

Yes. Closing this issue since it's fixed in #3564.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fuzzer-found
Projects
None yet
Development

No branches or pull requests

3 participants