Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Serialize/deserialize with libcudf pack/unpack #5025

Closed
wants to merge 114 commits into from
Closed
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
114 commits
Select commit Hold shift + click to select a range
7a9c48f
table_view pack: first commit.
devavret Apr 17, 2020
3a884e7
Add cudf::unpack
devavret Apr 17, 2020
6e3f23a
Pack-unpack tests fixed width nullable columns
devavret Apr 18, 2020
d43f287
Fix bug of pack still using old table view after contiguous split
devavret Apr 20, 2020
1862f0b
Fixed the case where columns could have nulls.
devavret Apr 20, 2020
a19b5ef
Tests and fix for string columns
devavret Apr 21, 2020
055fada
changelog
devavret Apr 21, 2020
0ac61f2
Use stream and mr in pack. Remove both from unpack
devavret Apr 21, 2020
b3cd355
Change table_data ref to base_ptr as per review
devavret Apr 21, 2020
d0d20f5
Change packed_table to hide serialized_column type
devavret Apr 22, 2020
f211a85
Change API to take ownership of packed_table in unpack
devavret Apr 22, 2020
e80b8d9
Merge branch 'branch-0.14' into pack-unpack
devavret Apr 22, 2020
9fd6985
clang format of code in this PR
devavret Apr 22, 2020
4428907
Change packed_table.table_metadata to unique_ptr
devavret Apr 22, 2020
aac9b14
Initial pack support in Cython
devavret Apr 23, 2020
38a6dff
Cython bindings for unpack
devavret Apr 23, 2020
7e3fb84
Move BufferArrayFromVector from parquet to utils
devavret Apr 23, 2020
2adc6b1
Documentation for pack/unpack
devavret Apr 23, 2020
a52d18b
Fixing style CI
devavret Apr 23, 2020
7f0bef8
Refactor unpack to avoid passing counter around
devavret Apr 23, 2020
110df91
Merge branch 'branch-0.14' of https://github.com/rapidsai/cudf into s…
shwina Apr 24, 2020
44acd2e
Modified pack to work with vector of columns
devavret Apr 24, 2020
b3ca181
Merge branch 'pack-unpack' of https://github.com/devavret/cudf into s…
shwina Apr 27, 2020
66eaeee
Add pack/unpack based serialization
shwina Apr 27, 2020
124a509
Handle any kind of device object in dask_serialize_cudf_object
shwina Apr 27, 2020
fa8c207
Add level_names setter
shwina Apr 27, 2020
7963f5f
Better handling of metadata in serialize/deserialize
shwina Apr 27, 2020
e97d949
Changelog
shwina Apr 28, 2020
0862cc6
Merge branch 'branch-0.14' of https://github.com/rapidsai/cudf into s…
shwina Apr 28, 2020
b36998e
update doc uniformly for device memory resource param mr
karthikeyann May 18, 2020
4f8e1a6
Merge branch 'branch-0.14' into doc-uniform-doc
karthikeyann May 18, 2020
50a446c
Merge branch 'fix-docs' of https://github.com/devavret/cudf into doc-…
karthikeyann May 19, 2020
cee0229
remove unused device memory resource param
karthikeyann May 19, 2020
17f524a
changelog entry for PR #5216
karthikeyann May 19, 2020
9070aa8
update doc of param stream in scalar_factories
karthikeyann May 19, 2020
0f4e928
update doc of param stream in scalar.hpp
karthikeyann May 19, 2020
f4058fa
update doc of param stream in column_factories
karthikeyann May 19, 2020
3bae00d
update doc for param stream in column.hpp
karthikeyann May 19, 2020
063fc9d
update doc for param stream in column_device_view.cuh
karthikeyann May 19, 2020
1ac169e
update doc for param stream for table.hpp
karthikeyann May 19, 2020
a6d762f
update doc for param stream for string headers
karthikeyann May 19, 2020
582c247
update doc for param stream for string detail headers
karthikeyann May 19, 2020
028d034
update doc for param stream for string cu files
karthikeyann May 19, 2020
3f9bfcc
update doc for param stream for dictionary headers
karthikeyann May 19, 2020
ac624a2
update doc for param steam in dictionary cu file
karthikeyann May 19, 2020
9fb327c
update doc for param stream in reduction functions
karthikeyann May 19, 2020
c0215a5
update doc for param stream in null_mask.hpp
karthikeyann May 19, 2020
9b8e8c0
update doc for param stream for io readers.hpp
karthikeyann May 19, 2020
0b05b0f
update doc for param stream for io writers.hpp
karthikeyann May 19, 2020
2eeab9d
update doc of param stream for csv headers
karthikeyann May 19, 2020
f2b2bf5
update doc of param stream for orc headers
karthikeyann May 19, 2020
64f56c5
update doc of param stream for parquet headers
karthikeyann May 19, 2020
5fdba29
update doc of param stream for misc io headers
karthikeyann May 19, 2020
17350ce
update doc of param stream for group reductions header
karthikeyann May 19, 2020
f93c829
update doc of param stream for binary op headers
karthikeyann May 19, 2020
3b9d239
update doc for param stream in nvtext headers
karthikeyann May 19, 2020
d3a3300
update doc for param stream in copy headers
karthikeyann May 19, 2020
e5fffae
update doc for param stream in scatter, gather headers
karthikeyann May 19, 2020
af9d392
update doc for param stream in fill headers
karthikeyann May 19, 2020
833995e
update doc for param stream in dlpack.hpp
karthikeyann May 19, 2020
ab1133f
update doc for param stream in hash headers
karthikeyann May 19, 2020
e8f1bad
update doc for param stream in join headers
karthikeyann May 19, 2020
9cc2fb3
update doc for param stream in rolling, rank
karthikeyann May 19, 2020
bbfe3bf
update doc for param stream in search, replace headers
karthikeyann May 19, 2020
139df92
update doc for param stream in transform, hashing, unary op headers
karthikeyann May 19, 2020
4c09ab6
Merge branch 'branch-0.14' of github.com:rapidsai/cudf into doc-unifo…
karthikeyann May 19, 2020
8547a9d
Apply suggestions from code review (Ram)
karthikeyann May 19, 2020
0d115c0
fix extra spacing in detail/transform.hpp
karthikeyann May 19, 2020
13f00c4
Merge branch 'branch-0.14' into doc-uniform-doc
karthikeyann May 19, 2020
1eb7907
add column's device memory in param mr doc
karthikeyann May 21, 2020
2acf660
add device memory to table param mr doc
karthikeyann May 21, 2020
2841464
Merge branch 'branch-0.14' of github.com:rapidsai/cudf into doc-unifo…
karthikeyann May 21, 2020
24f7d26
more doc update for param mr
karthikeyann May 21, 2020
ba502c9
more doc update for param stream
karthikeyann May 21, 2020
05c22f3
Add deprecation warnings to nvstrings, nvcategory, nvtext
harrism May 22, 2020
c8f2d30
Fix import
harrism May 22, 2020
02c365d
Changelog for #5254
harrism May 22, 2020
5d6868c
Merge branch 'branch-0.14' into pack-unpack
devavret May 22, 2020
ba12d9f
put back column_split_info because it don't need to be exposed anymore.
devavret May 22, 2020
1e722ed
Merge branch 'branch-0.14' into doc-uniform-doc
karthikeyann May 26, 2020
caff618
Fix compilation failure
codereport May 26, 2020
62a6864
Update CHANGELOG
codereport May 26, 2020
71c25f3
Merge branch 'branch-0.14' of https://github.com/rapidsai/cudf into f…
rgsl888prabhu May 26, 2020
cfc99b2
DeprecationWarning for nvstrings, nvcategory and nvtext
rgsl888prabhu May 26, 2020
23abde3
CHANGELOG.md
rgsl888prabhu May 26, 2020
610da49
style and CHANGELOG
rgsl888prabhu May 26, 2020
b765578
addressing reviews
rgsl888prabhu May 26, 2020
81b66d2
Merge branch 'branch-0.14' into bug-fp-debug-fail
codereport May 27, 2020
cfd4b33
Fix unit tests
codereport May 27, 2020
ad45fe2
Update cpp/tests/fixed_point/fixed_point_tests.cu
codereport May 27, 2020
954e1d5
Update cpp/tests/fixed_point/fixed_point_tests.cu
codereport May 27, 2020
2532487
Merge branch 'branch-0.14' into doc-uniform-doc
karthikeyann May 27, 2020
eebed57
Step one remove unused libraries
revans2 May 27, 2020
2f41c1f
Cleanup java dependencies
revans2 May 27, 2020
c7065dd
Update changelog
revans2 May 27, 2020
312106c
Addressing reviews
rgsl888prabhu May 27, 2020
8f451e7
Merge branch 'branch-0.14' into fea-deprecate-nvstrings
rgsl888prabhu May 27, 2020
84d895d
addressing reviews
rgsl888prabhu May 27, 2020
4f0b409
Merge branch 'fea-deprecate-nvstrings' of https://github.com/harrism/…
rgsl888prabhu May 27, 2020
0f47cc6
Addressed review comments
revans2 May 27, 2020
1bc72f5
spell check
revans2 May 27, 2020
45858a6
moving the nvstring condition below
rgsl888prabhu May 27, 2020
4520d0f
Merge pull request #5216 from karthikeyann/doc-uniform-doc
karthikeyann May 27, 2020
2bc016c
Merge branch 'pack-unpack' of https://github.com/devavret/cudf into s…
shwina May 27, 2020
9dca5ab
Merge pull request #5254 from harrism/fea-deprecate-nvstrings
May 27, 2020
5a607ff
Merge pull request #5285 from codereport/bug-fp-debug-fail
codereport May 27, 2020
7d7904c
Merge branch 'branch-0.14' into fix-deps
revans2 May 27, 2020
f7470a8
Merge pull request #5298 from revans2/fix-deps
revans2 May 27, 2020
afc2b5e
Hack around current Serializable requirements
shwina May 27, 2020
eb2975b
Merge branch 'branch-0.14' of https://github.com/rapidsai/cudf into s…
shwina May 27, 2020
ff406d8
Handle host frames in serialization
jakirkham May 28, 2020
83e9c4b
Revert "Hack around current Serializable requirements"
shwina May 28, 2020
d51bd54
Merge branch 'hdl_host_frames_ser' into serialize-with-pack-unpack
shwina May 28, 2020
0dec2e3
Convert DeviceBuffer<->Buffer when calling pack/unpack
shwina May 28, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
- PR #4789 Disallow timestamp sum and diffs via binary ops
- PR #4815 Add JNI total memory allocated API
- PR #4906 Add Java bindings for interleave_columns
- PR #4941 Add `cudf::pack` and `cudf::unpack`
- PR #4938 Add Java bindings for strip
- PR #4923 Add Java and JNI bindings for string split
- PR #4975 Add Java bindings for first and last aggregate expressions based on nth
Expand Down Expand Up @@ -120,6 +121,7 @@
- PR #4993 Remove Java memory prediction code
- PR #4985 Add null_count to Python Column ctors and use already computed null_count when possible
- PR #5002 Fix Column.__reduce__ to accept `null_count`
- PR #5025 Serialize/deserialize with libcudf pack/unpack

## Bug Fixes

Expand Down
1 change: 1 addition & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -568,6 +568,7 @@ add_library(cudf
src/copying/slice.cpp
src/copying/split.cpp
src/copying/contiguous_split.cu
src/copying/pack.cpp
src/copying/legacy/copy.cpp
src/copying/legacy/gather.cu
src/copying/legacy/scatter.cu
Expand Down
54 changes: 53 additions & 1 deletion cpp/include/cudf/copying.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -454,7 +454,59 @@ std::vector<contiguous_split_result> contiguous_split(
rmm::mr::device_memory_resource* mr = rmm::mr::get_default_resource());

/**
* @brief Returns a new column, where each element is selected from either @p lhs or
* @brief Table data in a serialized format
*
* Contains data from a table in two contiguous buffers: one on host, which contains table metadata
* and one on device which contains the table data.
*/
struct packed_columns {
packed_columns() = default;
packed_columns(std::unique_ptr<std::vector<uint8_t>> metadata,
std::unique_ptr<rmm::device_buffer> data)
: metadata(std::move(metadata)), data(std::move(data)){};
std::unique_ptr<std::vector<uint8_t>> metadata;
std::unique_ptr<rmm::device_buffer> data;
};

/**
* @brief Deep-copy a `table_view` into a serialized contiguous memory format
*
* The metadata from the `table_view` is copied into a host vector of bytes and the data from the
* `table_view` is copied into a `device_buffer`. Pass the output of this function into
* `cudf::experimental::unpack` to deserialize.
*
* @param input View of the table to pack
* @param[in] mr Optional, The resource to use for all returned device allocations
* @return packed_columns A struct containing the serialized metadata and data in contiguous host
* and device memory respectively
*/
packed_columns pack(std::vector<column_view> const& input,
rmm::mr::device_memory_resource* mr = rmm::mr::get_default_resource());

struct unpack_result {
std::vector<column_view> columns;
std::unique_ptr<rmm::device_buffer> all_data;
};

/**
* @brief Deserialize the result of `cudf::experimental::pack`
*
* Converts the result of a serialized table into a `table_view` that points to the data stored in
* the contiguous device buffer `output.all_data`. The data for the table `output.all_data` is moved
* from the input `packed_table`'s member `table_data`.
*
* It is the caller's responsibility to ensure that the `table_view` in the output does not outlive
* the device_buffer `all_data` in the output.
*
* No new device memory is allocated in this function.
*
* @param input The packed table to unpack
* @return contiguous_split_result The unpacked `table_view` and corresponding device data buffer
*/
unpack_result unpack(std::unique_ptr<packed_columns> input);

/**
* @brief Returns a new column, where each element is selected from either @p lhs or
* @p rhs based on the value of the corresponding element in @p boolean_mask
*
* Selects each element i in the output column from either @p rhs or @p lhs using the following
Expand Down
26 changes: 26 additions & 0 deletions cpp/include/cudf/detail/copy.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,23 @@ std::vector<column_view> slice(column_view const& input,
std::vector<size_type> const& indices,
cudaStream_t stream = 0);

/**
* @brief Information about the split for a given column. Bundled together
* into a struct because tuples were getting pretty unreadable.
*/
struct column_split_info {
size_t data_buf_size; // size of the data (including padding)
size_t validity_buf_size; // validity vector size (including padding)

size_t offsets_buf_size; // (strings only) size of offset column (including padding)
size_type num_chars; // (strings only) number of chars in the column
size_type chars_offset; // (strings only) offset from head of chars data
};

unpack_result alloc_and_copy(std::vector<column_view> const& t,
rmm::mr::device_memory_resource* mr,
cudaStream_t stream);

/**
* @copydoc cudf::experimental::contiguous_split
*
Expand All @@ -85,6 +102,15 @@ std::vector<contiguous_split_result> contiguous_split(
rmm::mr::device_memory_resource* mr = rmm::mr::get_default_resource(),
cudaStream_t stream = 0);

/**
* @copydoc cudf::experimental::pack
*
* @param stream Optional CUDA stream on which to execute kernels
**/
packed_columns pack(cudf::table_view const& input,
cudaStream_t stream = 0,
rmm::mr::device_memory_resource* mr = rmm::mr::get_default_resource());

/**
* @brief Creates an uninitialized new column of the specified size and same type as the `input`.
* Supports only fixed-width types.
Expand Down
74 changes: 24 additions & 50 deletions cpp/src/copying/contiguous_split.cu
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
#include <cudf/column/column_device_view.cuh>
#include <cudf/column/column_view.hpp>
#include <cudf/copying.hpp>
#include <cudf/detail/copy.hpp>
#include <cudf/detail/null_mask.hpp>
#include <cudf/detail/nvtx/ranges.hpp>
#include <cudf/detail/utilities/cuda.cuh>
Expand Down Expand Up @@ -170,21 +171,7 @@ __launch_bounds__(block_size) __global__
// start at that alignment.
static constexpr size_t split_align = 64;

/**
* @brief Information about the split for a given column. Bundled together
* into a struct because tuples were getting pretty unreadable.
*/
struct column_split_info {
size_t data_buf_size; // size of the data (including padding)
size_t validity_buf_size; // validity vector size (including padding)

size_t offsets_buf_size; // (strings only) size of offset column (including padding)
size_type num_chars; // (strings only) number of chars in the column
size_type chars_offset; // (strings only) offset from head of chars data
};

/**
* @brief Functor called by the `type_dispatcher` to incrementally compute total
/** @brief Functor called by the `type_dispatcher` to incrementally compute total
* memory buffer size needed to allocate a contiguous copy of all columns within
* a source table.
*/
Expand Down Expand Up @@ -351,14 +338,10 @@ struct column_preprocess_info {
* allocation.
*/
thrust::host_vector<column_split_info> preprocess_string_column_info(
cudf::table_view const& t,
rmm::device_vector<column_split_info>& device_split_info,
cudaStream_t stream)
{
// build a list of all the offset columns and their indices for all input string columns and put
// them on the gpu
std::vector<column_view> const& t, cudaStream_t stream) {
// build a list of all the offset columns and their indices for all input string columns and put them on the gpu
thrust::host_vector<column_preprocess_info> offset_columns;
offset_columns.reserve(t.num_columns()); // worst case
offset_columns.reserve(t.size()); // worst case

// collect only string columns
size_type column_index = 0;
Expand All @@ -373,7 +356,7 @@ thrust::host_vector<column_split_info> preprocess_string_column_info(
rmm::device_vector<column_preprocess_info> device_offset_columns = offset_columns;

// compute column split information
rmm::device_vector<thrust::pair<size_type, size_type>> device_offsets(t.num_columns());
rmm::device_vector<thrust::pair<size_type, size_type>> device_offsets(t.size());
auto* offsets_p = device_offsets.data().get();
thrust::for_each(rmm::exec_policy(stream)->on(stream),
device_offset_columns.begin(),
Expand All @@ -384,7 +367,7 @@ thrust::host_vector<column_split_info> preprocess_string_column_info(
cpi.offsets.head<int32_t>()[cpi.offset + cpi.size]);
});
thrust::host_vector<thrust::pair<size_type, size_type>> host_offsets(device_offsets);
thrust::host_vector<column_split_info> split_info(t.num_columns());
thrust::host_vector<column_split_info> split_info(t.size());
std::for_each(offset_columns.begin(),
offset_columns.end(),
[&split_info, &host_offsets](column_preprocess_info const& cpi) {
Expand All @@ -403,6 +386,8 @@ thrust::host_vector<column_split_info> preprocess_string_column_info(
return split_info;
}

}; // anonymous namespace

/**
* @brief Creates a contiguous_split_result object which contains a deep-copy of the input
* table_view into a single contiguous block of memory.
Expand All @@ -411,14 +396,11 @@ thrust::host_vector<column_split_info> preprocess_string_column_info(
* call with the input table. The memory referenced by the table_view and its internal column_views
* is entirely contained in single block of memory.
*/
contiguous_split_result alloc_and_copy(cudf::table_view const& t,
rmm::device_vector<column_split_info>& device_split_info,
rmm::mr::device_memory_resource* mr,
cudaStream_t stream)
{
unpack_result alloc_and_copy(std::vector<column_view> const& t,
rmm::mr::device_memory_resource* mr,
cudaStream_t stream) {
// preprocess column split information for string columns.
thrust::host_vector<column_split_info> split_info =
preprocess_string_column_info(t, device_split_info, stream);
thrust::host_vector<column_split_info> split_info = preprocess_string_column_info(t, stream);

// compute the rest of the column sizes (non-string columns, and total buffer size)
size_t total_size = 0;
Expand All @@ -437,7 +419,7 @@ contiguous_split_result alloc_and_copy(cudf::table_view const& t,
// copy (this would be cleaner with a std::transform, but there's an nvcc compiler issue in the
codereport marked this conversation as resolved.
Show resolved Hide resolved
// way)
std::vector<column_view> out_cols;
out_cols.reserve(t.num_columns());
out_cols.reserve(t.size());

column_index = 0;
std::for_each(
Expand All @@ -447,34 +429,26 @@ contiguous_split_result alloc_and_copy(cudf::table_view const& t,
column_index++;
});

return contiguous_split_result{cudf::table_view{out_cols}, std::move(device_buf)};
return unpack_result{out_cols, std::move(device_buf)};
}

}; // anonymous namespace

std::vector<contiguous_split_result> contiguous_split(cudf::table_view const& input,
std::vector<size_type> const& splits,
rmm::mr::device_memory_resource* mr,
cudaStream_t stream)
{
auto subtables = cudf::experimental::split(input, splits);

// optimization : for large numbers of splits this allocation can dominate total time
// spent if done inside alloc_and_copy(). so we'll allocate it once
// and reuse it.
//
// benchmark: 1 GB data, 10 columns, 256 splits.
// no optimization: 106 ms (8 GB/s)
// optimization: 20 ms (48 GB/s)
rmm::device_vector<column_split_info> device_split_info(input.num_columns());

std::vector<contiguous_split_result> result;
std::transform(subtables.begin(),
subtables.end(),
std::back_inserter(result),
[mr, stream, &device_split_info](table_view const& t) {
return alloc_and_copy(t, device_split_info, mr, stream);
});
std::transform(
subtables.begin(),
subtables.end(),
std::back_inserter(result),
[mr, stream](table_view const& t) {
std::vector<column_view> table_columns(t.begin(), t.end());
unpack_result result(alloc_and_copy(table_columns, mr, stream));
return contiguous_split_result{table_view(result.columns), std::move(result.all_data)};
});

return result;
}
Expand Down
Loading