Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add device create_sequence_table for benchmarks #10300

Merged
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
916ce00
rename generate_input.cpp to generate_input.cu
karthikeyann Feb 14, 2022
d7f0f29
add create_sequence_table, create_random_null_mask
karthikeyann Feb 14, 2022
bb74cc7
fix includes, seed
karthikeyann Feb 15, 2022
0ea4f60
use cuda::std to include int128
karthikeyann Feb 15, 2022
a25241e
use -std=gnu++17 for generate_input.cu for int128 support
karthikeyann Feb 15, 2022
dfd33f2
go back to using BENCHMARK_TEMPLATE_DEFINE_F
karthikeyann Feb 15, 2022
f9f3eec
use create_sequence_table in ast bench
karthikeyann Feb 15, 2022
81ac53a
use create_sequence_table in binops bench
karthikeyann Feb 15, 2022
6c659d4
use create_sequence_table, thrust::shuffle in scatter bench
karthikeyann Feb 15, 2022
9f5c5ba
use cudf::sequence, create_random_null_mask in search bench
karthikeyann Feb 15, 2022
6758095
update copyright year
karthikeyann Feb 15, 2022
718e269
style fix clang format
karthikeyann Feb 15, 2022
704bb72
Merge branch 'branch-22.04' of github.com:rapidsai/cudf into fea-benc…
karthikeyann Feb 15, 2022
0ad778e
address review comments
karthikeyann Feb 17, 2022
9dd9244
Merge branch 'branch-22.04' of github.com:rapidsai/cudf into fea-benc…
karthikeyann Feb 17, 2022
bda1f6c
const auto to auto const
karthikeyann Feb 19, 2022
d568d09
address review comments
karthikeyann Feb 19, 2022
993c85d
reduce code duplication
karthikeyann Feb 21, 2022
bdbdf49
Merge branch 'branch-22.04' into fea-benchmark_speedup_2.6
karthikeyann Feb 22, 2022
02ef0d2
Revert "rename generate_input.cpp to generate_input.cu"
karthikeyann Feb 22, 2022
820b417
rename generator functor
karthikeyann Feb 24, 2022
9028a80
simplify create null mask
karthikeyann Feb 24, 2022
4f1f3e8
rename repeat_dtypes to cycle_dtypes
karthikeyann Feb 24, 2022
b31de3a
move cycle_dtypes out for create_sequence_table
karthikeyann Feb 24, 2022
1d4d57a
move cycle_dtypes out of create_random_table
karthikeyann Feb 24, 2022
581e4b8
fix null mask null_count
karthikeyann Feb 24, 2022
fbd5708
address review comments
karthikeyann Feb 24, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cpp/benchmarks/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

find_package(Threads REQUIRED)

add_library(cudf_datagen STATIC common/generate_input.cpp)
add_library(cudf_datagen STATIC common/generate_input.cpp common/generate_nullmask.cu)
target_compile_features(cudf_datagen PUBLIC cxx_std_17 cuda_std_17)

target_compile_options(
Expand Down
71 changes: 22 additions & 49 deletions cpp/benchmarks/ast/transform.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020, NVIDIA CORPORATION.
* Copyright (c) 2020-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,72 +14,40 @@
* limitations under the License.
*/

#include <cudf/column/column_factories.hpp>
#include <cudf/table/table.hpp>
#include <cudf/table/table_view.hpp>
#include <benchmarks/common/generate_input.hpp>
#include <benchmarks/fixture/benchmark_fixture.hpp>
#include <benchmarks/synchronization/synchronization.hpp>

#include <cudf/transform.hpp>
#include <cudf/types.hpp>
#include <cudf/utilities/error.hpp>

#include <cudf_test/column_wrapper.hpp>

#include <benchmark/benchmark.h>
#include <fixture/benchmark_fixture.hpp>
#include <fixture/templated_benchmark_fixture.hpp>
#include <synchronization/synchronization.hpp>

#include <thrust/iterator/counting_iterator.h>

#include <algorithm>
#include <list>
#include <numeric>
#include <random>
#include <memory>
#include <vector>

enum class TreeType {
IMBALANCED_LEFT // All operator expressions have a left child operator expression and a right
// child column reference
};

template <typename key_type, TreeType tree_type, bool reuse_columns, bool Nullable>
class AST : public cudf::benchmark {
};

template <typename key_type, TreeType tree_type, bool reuse_columns, bool Nullable>
static void BM_ast_transform(benchmark::State& state)
{
const cudf::size_type table_size{(cudf::size_type)state.range(0)};
const cudf::size_type tree_levels = (cudf::size_type)state.range(1);
auto const table_size{static_cast<cudf::size_type>(state.range(0))};
auto const tree_levels{static_cast<cudf::size_type>(state.range(1))};

// Create table data
auto n_cols = reuse_columns ? 1 : tree_levels + 1;
auto column_wrappers = std::vector<cudf::test::fixed_width_column_wrapper<key_type>>(n_cols);
auto columns = std::vector<cudf::column_view>(n_cols);

auto data_iterator = thrust::make_counting_iterator(0);

if constexpr (Nullable) {
auto validities = std::vector<bool>(table_size);
std::random_device rd;
std::mt19937 gen(rd());

std::generate(
validities.begin(), validities.end(), [&]() { return gen() > (0.5 * gen.max()); });
std::generate_n(column_wrappers.begin(), n_cols, [=]() {
return cudf::test::fixed_width_column_wrapper<key_type>(
data_iterator, data_iterator + table_size, validities.begin());
});
} else {
std::generate_n(column_wrappers.begin(), n_cols, [=]() {
return cudf::test::fixed_width_column_wrapper<key_type>(data_iterator,
data_iterator + table_size);
});
}
std::transform(
column_wrappers.begin(), column_wrappers.end(), columns.begin(), [](auto const& col) {
return static_cast<cudf::column_view>(col);
});

cudf::table_view table{columns};
auto const n_cols = reuse_columns ? 1 : tree_levels + 1;
auto const source_table = create_sequence_table(
{cudf::type_to_id<key_type>()}, n_cols, row_count{table_size}, Nullable ? 0.5 : -1.0);
auto table = source_table->view();

// Create column references
auto column_refs = std::vector<cudf::ast::column_reference>();
Expand Down Expand Up @@ -138,10 +106,15 @@ static void CustomRanges(benchmark::internal::Benchmark* b)
}
}

#define AST_TRANSFORM_BENCHMARK_DEFINE(name, key_type, tree_type, reuse_columns, nullable) \
TEMPLATED_BENCHMARK_F(AST, BM_ast_transform, key_type, tree_type, reuse_columns, nullable) \
->Apply(CustomRanges) \
->Unit(benchmark::kMillisecond) \
#define AST_TRANSFORM_BENCHMARK_DEFINE(name, key_type, tree_type, reuse_columns, nullable) \
BENCHMARK_TEMPLATE_DEFINE_F(AST, name, key_type, tree_type, reuse_columns, nullable) \
(::benchmark::State & st) \
{ \
BM_ast_transform<key_type, tree_type, reuse_columns, nullable>(st); \
} \
BENCHMARK_REGISTER_F(AST, name) \
->Apply(CustomRanges) \
->Unit(benchmark::kMillisecond) \
->UseManualTime();

AST_TRANSFORM_BENCHMARK_DEFINE(
Expand Down
53 changes: 17 additions & 36 deletions cpp/benchmarks/binaryop/binaryop.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020, NVIDIA CORPORATION.
* Copyright (c) 2020-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,23 +14,15 @@
* limitations under the License.
*/

#include <benchmarks/common/generate_input.hpp>
#include <benchmarks/fixture/benchmark_fixture.hpp>
#include <benchmarks/synchronization/synchronization.hpp>

#include <cudf/binaryop.hpp>
#include <cudf/column/column_factories.hpp>
#include <cudf/table/table.hpp>
#include <cudf/table/table_view.hpp>
#include <cudf/types.hpp>
#include <cudf/utilities/error.hpp>

#include <cudf_test/column_wrapper.hpp>

#include <benchmark/benchmark.h>
#include <fixture/benchmark_fixture.hpp>
#include <synchronization/synchronization.hpp>

#include <thrust/iterator/counting_iterator.h>

#include <algorithm>
#include <numeric>
#include <vector>

// This set of benchmarks is designed to be a comparison for the AST benchmarks
Expand All @@ -47,40 +39,29 @@ class BINARYOP : public cudf::benchmark {
template <typename key_type, TreeType tree_type, bool reuse_columns>
static void BM_binaryop_transform(benchmark::State& state)
{
const cudf::size_type table_size{(cudf::size_type)state.range(0)};
const cudf::size_type tree_levels = (cudf::size_type)state.range(1);
auto const table_size{static_cast<cudf::size_type>(state.range(0))};
auto const tree_levels{static_cast<cudf::size_type>(state.range(1))};

// Create table data
auto n_cols = reuse_columns ? 1 : tree_levels + 1;
auto column_wrappers = std::vector<cudf::test::fixed_width_column_wrapper<key_type>>();
auto columns = std::vector<cudf::column_view>(n_cols);

auto data_iterator = thrust::make_counting_iterator(0);
std::generate_n(std::back_inserter(column_wrappers), n_cols, [=]() {
return cudf::test::fixed_width_column_wrapper<key_type>(data_iterator,
data_iterator + table_size);
});
std::transform(
column_wrappers.begin(), column_wrappers.end(), columns.begin(), [](auto const& col) {
return static_cast<cudf::column_view>(col);
});

cudf::table_view table{columns};
auto const n_cols = reuse_columns ? 1 : tree_levels + 1;
auto const source_table =
create_sequence_table({cudf::type_to_id<key_type>()}, n_cols, row_count{table_size});
cudf::table_view table{*source_table};

// Execute benchmark
for (auto _ : state) {
cuda_event_timer raii(state, true); // flush_l2_cache = true, stream = 0
// Execute tree that chains additions like (((a + b) + c) + d)
auto const op = cudf::binary_operator::ADD;
auto result_data_type = cudf::data_type(cudf::type_to_id<key_type>());
auto const op = cudf::binary_operator::ADD;
auto const result_data_type = cudf::data_type(cudf::type_to_id<key_type>());
if (reuse_columns) {
auto result = cudf::binary_operation(columns.at(0), columns.at(0), op, result_data_type);
auto result = cudf::binary_operation(table.column(0), table.column(0), op, result_data_type);
for (cudf::size_type i = 0; i < tree_levels - 1; i++) {
result = cudf::binary_operation(result->view(), columns.at(0), op, result_data_type);
result = cudf::binary_operation(result->view(), table.column(0), op, result_data_type);
}
} else {
auto result = cudf::binary_operation(columns.at(0), columns.at(1), op, result_data_type);
std::for_each(std::next(columns.cbegin(), 2), columns.cend(), [&](auto const& col) {
auto result = cudf::binary_operation(table.column(0), table.column(1), op, result_data_type);
std::for_each(std::next(table.begin(), 2), table.end(), [&](auto const& col) {
result = cudf::binary_operation(result->view(), col, op, result_data_type);
});
}
Expand Down
21 changes: 8 additions & 13 deletions cpp/benchmarks/binaryop/compiled_binaryop.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,30 +14,25 @@
* limitations under the License.
*/

#include <fixture/benchmark_fixture.hpp>
#include <fixture/templated_benchmark_fixture.hpp>
#include <synchronization/synchronization.hpp>

#include <cudf_test/column_wrapper.hpp>
#include <benchmarks/common/generate_input.hpp>
#include <benchmarks/fixture/benchmark_fixture.hpp>
#include <benchmarks/synchronization/synchronization.hpp>

#include <cudf/binaryop.hpp>

#include <thrust/iterator/counting_iterator.h>

class COMPILED_BINARYOP : public cudf::benchmark {
};

template <typename TypeLhs, typename TypeRhs, typename TypeOut>
void BM_compiled_binaryop(benchmark::State& state, cudf::binary_operator binop)
{
const cudf::size_type column_size{(cudf::size_type)state.range(0)};
auto const column_size{static_cast<cudf::size_type>(state.range(0))};

auto data_it = thrust::make_counting_iterator(0);
cudf::test::fixed_width_column_wrapper<TypeLhs> input1(data_it, data_it + column_size);
cudf::test::fixed_width_column_wrapper<TypeRhs> input2(data_it, data_it + column_size);
auto const source_table = create_sequence_table(
{cudf::type_to_id<TypeLhs>(), cudf::type_to_id<TypeRhs>()}, 2, row_count{column_size});

auto lhs = cudf::column_view(input1);
auto rhs = cudf::column_view(input2);
auto lhs = cudf::column_view(source_table->get_column(0));
auto rhs = cudf::column_view(source_table->get_column(1));
auto output_dtype = cudf::data_type(cudf::type_to_id<TypeOut>());

// Call once for hot cache.
Expand Down
30 changes: 30 additions & 0 deletions cpp/benchmarks/common/generate_input.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@

#include <cudf/column/column.hpp>
#include <cudf/detail/utilities/vector_factories.hpp>
#include <cudf/filling.hpp>
#include <cudf/scalar/scalar_factories.hpp>
#include <cudf/table/table.hpp>
#include <cudf/utilities/bit.hpp>

Expand Down Expand Up @@ -642,6 +644,34 @@ std::unique_ptr<cudf::table> create_random_table(std::vector<cudf::type_id> cons
return std::make_unique<cudf::table>(std::move(output_columns));
}

std::unique_ptr<cudf::table> create_sequence_table(std::vector<cudf::type_id> const& dtype_ids,
cudf::size_type num_cols,
row_count num_rows,
float null_probability,
unsigned seed)
{
auto const out_dtype_ids = repeat_dtypes(dtype_ids, num_cols);
auto columns = std::vector<std::unique_ptr<cudf::column>>(num_cols);
auto create_sequence_column = [&](auto const& init) mutable {
auto col = cudf::sequence(num_rows.count, init);
auto [mask, count] = create_random_null_mask(num_rows.count, null_probability, seed++);
col->set_null_mask(std::move(mask), count);
return col;
};
if (dtype_ids.size() == 1) {
auto init = cudf::make_default_constructed_scalar(cudf::data_type{dtype_ids[0]});
std::generate_n(
columns.begin(), num_cols, [&]() mutable { return create_sequence_column(*init); });
} else {
std::transform(
out_dtype_ids.begin(), out_dtype_ids.end(), columns.begin(), [&](auto dtype) mutable {
auto init = cudf::make_default_constructed_scalar(cudf::data_type{dtype});
return create_sequence_column(*init);
});
}
return std::make_unique<cudf::table>(std::move(columns));
}

std::vector<cudf::type_id> get_type_or_group(int32_t id)
{
// identity transformation when passing a concrete type_id
Expand Down
43 changes: 39 additions & 4 deletions cpp/benchmarks/common/generate_input.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
#include <map>

#include <cudf/table/table.hpp>
#include <cudf/utilities/span.hpp>
#include <cudf/utilities/traits.hpp>

/**
Expand Down Expand Up @@ -223,9 +224,9 @@ class data_profile {
cudf::size_type avg_run_length = 4;

public:
template <
typename T,
typename std::enable_if_t<!std::is_same_v<T, bool> && std::is_integral_v<T>, T>* = nullptr>
template <typename T,
typename std::enable_if_t<!std::is_same_v<T, bool> && cuda::std::is_integral_v<T>, T>* =
nullptr>
distribution_params<T> get_distribution_params() const
{
auto it = int_params.find(cudf::type_to_id<T>());
Expand Down Expand Up @@ -306,7 +307,7 @@ class data_profile {
// discrete distributions (integers, strings, lists). Otherwise the call with have no effect.
template <typename T,
typename Type_enum,
typename std::enable_if_t<std::is_integral_v<T>, T>* = nullptr>
typename std::enable_if_t<cuda::std::is_integral_v<T>, T>* = nullptr>
void set_distribution_params(Type_enum type_or_group,
distribution_id dist,
T lower_bound,
Expand Down Expand Up @@ -402,3 +403,37 @@ std::unique_ptr<cudf::table> create_random_table(std::vector<cudf::type_id> cons
row_count num_rows,
data_profile const& data_params = data_profile{},
unsigned seed = 1);

/**
* @brief Generate sequence columns starting with value 0 in first row and increasing by 1 in
* subsequent rows.
*
* If the number of passed types is smaller than the number of requested columns, the types
* will be repeated cyclically to fill the number of requested columns.
karthikeyann marked this conversation as resolved.
Show resolved Hide resolved
*
* @param dtype_ids Span of requested column types
* @param num_cols Number of columns in the output table
* @param num_rows Number of rows in the output table
* @param null_probability optional, probability of a null value
* <0 implies no null mask, =0 implies all valids, >=1 implies all nulls
* @param seed optional, seed for the pseudo-random engine
* @return A table with the sequence columns.
*/
std::unique_ptr<cudf::table> create_sequence_table(std::vector<cudf::type_id> const& dtype_ids,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Span

Copy link
Contributor Author

@karthikeyann karthikeyann Feb 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

span can't used with initializer list. initializer list is inline, and convenient here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then be explicit and make the parameter be initializer_list, not vector. Using vector isn't saying what you mean.

Copy link
Contributor

@bdice bdice Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why this is awkward. We can't use span because it can't be deduced from an initializer_list. However, we can't use initializer_list because then it prohibits passing a vector (which span would allow). We need to support passing a vector, especially if we remove the behavior of cycling dtypes and require the caller to call repeat_dtypes to construct that vector before calling create_sequence_table. As far as I can tell, leaving this as a vector is the only option.

See also: https://quuxplusone.github.io/blog/2021/10/03/p2447-span-from-initializer-list/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. With moving repeate_dtypes outside, initializer_list parameter can't be used in create_sequence_table. Also, span parameter can't be used with initializer_list argument. implicit conversion of initializer_list to vector is only way. So, leaving this as vector.

cudf::size_type num_cols,
row_count num_rows,
float null_probability = -1.0,
unsigned seed = 1);

/**
* @brief Create a random null mask object
*
* @param size number of rows
* @param null_probability probability of a null value
* <0 implies no null mask, =0 implies all valids, >=1 implies all nulls
* @param seed optional, seed for the pseudo-random engine
* @return null mask device buffer with random null mask data and null count
*/
std::pair<rmm::device_buffer, cudf::size_type> create_random_null_mask(cudf::size_type size,
float null_probability,
unsigned seed = 1);
bdice marked this conversation as resolved.
Show resolved Hide resolved
Loading