Skip to content

Commit

Permalink
Add map_from_entries Presto function (#3417)
Browse files Browse the repository at this point in the history
Summary:
* Adding the `map_from_entries` Presto function to Velox

`map_from_entries(array(row(K, V))) -> map(K, V)`

> Returns a map created from the given array of entries.

For example:
```
SELECT map_from_entries(ARRAY[(1, 'x'), (2, 'y')]); -- {1 -> 'x', 2 -> 'y'}
SELECT map_from_entries(ARRAY[(1, 'x'), (2, null)]); -- {1 -> 'x', 2 -> null}
SELECT map_from_entries(ARRAY[(1, 'x'), (1, 'y')]); -- duplicate key error
SELECT map_from_entries(ARRAY[(null, 'x'), (1, 'y')]); -- map key cannot be null error
SELECT map_from_entries(ARRAY[cast(null as ROW(int, varchar)), (1, 'y')]); -- map entry cannot be null error
```

Pull Request resolved: #3417

Reviewed By: pranjalssh

Differential Revision: D46169043

Pulled By: darrenfu

fbshipit-source-id: ec7d929b94c709a22cbd6eae983864616252c833
  • Loading branch information
darrenfu authored and facebook-github-bot committed Jun 13, 2023
1 parent 0782dd9 commit 0e6eb8f
Show file tree
Hide file tree
Showing 7 changed files with 400 additions and 16 deletions.
32 changes: 19 additions & 13 deletions velox/docs/functions/presto/map.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,17 +27,17 @@ Map Functions

See also :func:`map_agg` for creating a map as an aggregation.

.. function:: map_concat(map1(K,V), map2(K,V), ..., mapN(K,V)) -> map(K,V)

Returns the union of all the given maps. If a key is found in multiple given maps,
that key's value in the resulting map comes from the last one of those maps.

.. function:: map_entries(map(K,V)) -> array(row(K,V))

Returns an array of all entries in the given map. ::

SELECT map_entries(MAP(ARRAY[1, 2], ARRAY['x', 'y'])); -- [ROW(1, 'x'), ROW(2, 'y')]

.. function:: map_concat(map1(K,V), map2(K,V), ..., mapN(K,V)) -> map(K,V)

Returns the union of all the given maps. If a key is found in multiple given maps,
that key's value in the resulting map comes from the last one of those maps.

.. function:: map_filter(map(K,V), function(K,V,boolean)) -> map(K,V)

Constructs a map from those entries of ``map`` for which ``function`` returns true::
Expand All @@ -46,6 +46,12 @@ Map Functions
SELECT map_filter(MAP(ARRAY[10, 20, 30], ARRAY['a', NULL, 'c']), (k, v) -> v IS NOT NULL); -- {10 -> a, 30 -> c}
SELECT map_filter(MAP(ARRAY['k1', 'k2', 'k3'], ARRAY[20, 3, 15]), (k, v) -> v > 10); -- {k1 -> 20, k3 -> 15}

.. function:: map_from_entries(array(row(K, V))) -> map(K, V)

Returns a map created from the given array of entries. ::

SELECT map_from_entries(ARRAY[(1, 'x'), (2, 'y')]); -- {1 -> 'x', 2 -> 'y'}

.. function:: map_keys(x(K,V)) -> array(K)

Returns all the keys in the map ``x``.
Expand All @@ -54,14 +60,6 @@ Map Functions

Returns all the values in the map ``x``.

.. function:: subscript(map(K, V), key) -> V
:noindex:

Returns value for given ``key``. Throws if the key is not contained in the map.
Corresponds to SQL subscript operator [].

SELECT name_to_age_map['Bob'] AS bob_age;

.. function:: map_zip_with(map(K,V1), map(K,V2), function(K,V1,V2,V3)) -> map(K,V3)

Merges the two given maps into a single map by applying ``function`` to the pair of values with the same key.
Expand All @@ -77,6 +75,14 @@ Map Functions
MAP(ARRAY['a', 'b', 'c'], ARRAY[1, 2, 3]),
(k, v1, v2) -> k || CAST(v1/v2 AS VARCHAR));

.. function:: subscript(map(K, V), key) -> V
:noindex:

Returns value for given ``key``. Throws if the key is not contained in the map.
Corresponds to SQL subscript operator [].

SELECT name_to_age_map['Bob'] AS bob_age;

.. function:: transform_keys(map(K1,V), function(K1,V,K2)) -> map(K2,V)

Returns a map that applies ``function`` to each entry of ``map`` and transforms the keys::
Expand Down
1 change: 1 addition & 0 deletions velox/functions/prestosql/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ add_library(
JsonFunctions.cpp
Map.cpp
MapEntries.cpp
MapFromEntries.cpp
MapKeysAndValues.cpp
MapZipWith.cpp
Not.cpp
Expand Down
136 changes: 136 additions & 0 deletions velox/functions/prestosql/MapFromEntries.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
/*
* Copyright (c) Facebook, Inc. and its affiliates.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "velox/expression/EvalCtx.h"
#include "velox/expression/Expr.h"
#include "velox/expression/VectorFunction.h"
#include "velox/functions/lib/CheckDuplicateKeys.h"
#include "velox/functions/lib/RowsTranslationUtil.h"

namespace facebook::velox::functions {
namespace {
// See documentation at https://prestodb.io/docs/current/functions/map.html
class MapFromEntriesFunction : public exec::VectorFunction {
public:
void apply(
const SelectivityVector& rows,
std::vector<VectorPtr>& args,
const TypePtr& outputType,
exec::EvalCtx& context,
VectorPtr& result) const override {
VELOX_CHECK_EQ(args.size(), 1);
auto& arg = args[0];
VectorPtr localResult;

// Input can be constant or flat.
if (arg->isConstantEncoding()) {
auto* constantArray = arg->as<ConstantVector<ComplexType>>();
const auto& flatArray = constantArray->valueVector();
const auto flatIndex = constantArray->index();

exec::LocalSelectivityVector singleRow(context, flatIndex + 1);
singleRow->clearAll();
singleRow->setValid(flatIndex, true);
singleRow->updateBounds();

localResult = applyFlat(
*singleRow.get(), flatArray->as<ArrayVector>(), outputType, context);
localResult =
BaseVector::wrapInConstant(rows.size(), flatIndex, localResult);
} else {
localResult =
applyFlat(rows, arg->as<ArrayVector>(), outputType, context);
}

context.moveOrCopyResult(localResult, rows, result);
}

static std::vector<std::shared_ptr<exec::FunctionSignature>> signatures() {
return {// array(unknown) -> map(unknown, unknown)
exec::FunctionSignatureBuilder()
.returnType("map(unknown, unknown)")
.argumentType("array(unknown)")
.build(),
// array(row(K,V)) -> map(K,V)
exec::FunctionSignatureBuilder()
.knownTypeVariable("K")
.typeVariable("V")
.returnType("map(K,V)")
.argumentType("array(row(K,V))")
.build()};
}

private:
VectorPtr applyFlat(
const SelectivityVector& rows,
const ArrayVector* inputArray,
const TypePtr& outputType,
exec::EvalCtx& context) const {
auto& inputRowVector = inputArray->elements();
exec::LocalDecodedVector decodedRow(context);
decodedRow.get()->decode(*inputRowVector);
auto rowVector = decodedRow->base()->as<RowVector>();
auto rowKeyVector = rowVector->childAt(0);

// Validate all map entries and map keys are not null.
if (decodedRow->mayHaveNulls() || rowKeyVector->mayHaveNulls()) {
context.applyToSelectedNoThrow(rows, [&](vector_size_t row) {
auto size = inputArray->sizeAt(row);
auto offset = inputArray->offsetAt(row);
for (auto i = 0; i < size; ++i) {
bool isMapEntryNull = decodedRow->isNullAt(offset + i);
VELOX_USER_CHECK(!isMapEntryNull, "map entry cannot be null");
bool isMapKeyNull =
rowKeyVector->isNullAt(decodedRow->index(offset + i));
VELOX_USER_CHECK(!isMapKeyNull, "map key cannot be null");
}
});
}

VectorPtr wrappedKeys;
VectorPtr wrappedValues;
if (decodedRow->isIdentityMapping()) {
wrappedKeys = rowVector->childAt(0);
wrappedValues = rowVector->childAt(1);
} else {
wrappedKeys = decodedRow->wrap(
rowVector->childAt(0), *inputRowVector, inputRowVector->size());
wrappedValues = decodedRow->wrap(
rowVector->childAt(1), *inputRowVector, inputRowVector->size());
}

// To avoid creating new buffers, we try to reuse the input's buffers
// as many as possible.
auto mapVector = std::make_shared<MapVector>(
context.pool(),
outputType,
inputArray->nulls(),
rows.end(),
inputArray->offsets(),
inputArray->sizes(),
wrappedKeys,
wrappedValues);

checkDuplicateKeys(mapVector, rows, context);
return mapVector;
}
};
} // namespace

VELOX_DECLARE_VECTOR_FUNCTION(
udf_map_from_entries,
MapFromEntriesFunction::signatures(),
std::make_unique<MapFromEntriesFunction>());
} // namespace facebook::velox::functions
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ void registerMapFunctions(const std::string& prefix) {
udf_transform_values, prefix + "transform_values");
VELOX_REGISTER_VECTOR_FUNCTION(udf_map, prefix + "map");
VELOX_REGISTER_VECTOR_FUNCTION(udf_map_entries, prefix + "map_entries");
VELOX_REGISTER_VECTOR_FUNCTION(
udf_map_from_entries, prefix + "map_from_entries");
VELOX_REGISTER_VECTOR_FUNCTION(udf_map_keys, prefix + "map_keys");
VELOX_REGISTER_VECTOR_FUNCTION(udf_map_values, prefix + "map_values");
VELOX_REGISTER_VECTOR_FUNCTION(udf_map_zip_with, prefix + "map_zip_with");
Expand Down
1 change: 1 addition & 0 deletions velox/functions/prestosql/tests/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ add_executable(
JsonFunctionsTest.cpp
MapEntriesTest.cpp
MapFilterTest.cpp
MapFromEntriesTest.cpp
MapKeysAndValuesTest.cpp
MapTest.cpp
MapZipWithTest.cpp
Expand Down
Loading

0 comments on commit 0e6eb8f

Please sign in to comment.