Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor rolling.cu to reduce compile time #6512

Merged
merged 10 commits into from
Dec 17, 2020

Conversation

mythrocks
Copy link
Contributor

@mythrocks mythrocks commented Oct 13, 2020

Closes #6472.

rolling.cu is taking inordinately long to compile, slowing down the libcudf build. The following changes were made to mitigate this:

  1. Moved grouped_rolling_window() and grouped_time_based_rolling_window() to grouped_rolling.cu. Common functions were moved to rolling_detail.cuh.
  2. Normalized timestamp columns to use int64_t representations. This reduces the number of template instantiations for time_based_grouped_rolling_window().
  3. grouped_*_rolling_window() functions used to pass around fancy iterators, causing massive template instantiations. This has been changed to materialize the window offsets as separate columns, and use those with existing rolling_window() functions to produce the final result.

These changes have been tested by running a window function test from SparkSQL, over a 2.4GB ORC file with 155M records (1.5M groups of about 97 records each on average):

  1. There has been no discernible change in the end-to-end runtime. (The nsys profile seems to indicate that the total time spent in the gpu_rolling kernel has reduced. This is still being examined, to confirm.)
  2. Compiling rolling.cu and grouped_rolling.cu in parallel now takes 60s as opposed to about 300s before.
  3. The object file size seems to have reduced by a factor of 3.

@mythrocks mythrocks added 2 - In Progress Currently a work in progress 4 - Needs Review Waiting for reviewer to review or respond tech debt labels Oct 13, 2020
@mythrocks mythrocks requested review from a team as code owners October 13, 2020 06:19
@mythrocks mythrocks self-assigned this Oct 13, 2020
@GPUtester
Copy link
Collaborator

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@codecov
Copy link

codecov bot commented Oct 13, 2020

Codecov Report

Merging #6512 (bd0e9b7) into branch-0.18 (7ca3fad) will increase coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.18    #6512      +/-   ##
===============================================
+ Coverage        82.04%   82.06%   +0.02%     
===============================================
  Files               96       96              
  Lines            16388    16391       +3     
===============================================
+ Hits             13446    13452       +6     
+ Misses            2942     2939       -3     
Impacted Files Coverage Δ
python/cudf/cudf/_fuzz_testing/fuzzer.py 0.00% <0.00%> (ø)
python/cudf/cudf/utils/hash_vocab_utils.py 100.00% <0.00%> (ø)
python/cudf/cudf/core/abc.py 91.48% <0.00%> (+4.25%) ⬆️
python/cudf/cudf/utils/gpu_utils.py 58.53% <0.00%> (+4.87%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7ca3fad...bd0e9b7. Read the comment docs.

@harrism
Copy link
Member

harrism commented Oct 13, 2020

Is there any performance impact of supporting only int64_t reps?

@mythrocks
Copy link
Contributor Author

Is there any performance impact of supporting only int64_t reps?

There is the additional cost of casting the DAYS column down to SECONDS. I have yet to measure and confirm that this cost isn't egregious.

@harrism harrism changed the title [FEA] Refactor rolling.cu to reduce compile time Refactor rolling.cu to reduce compile time Oct 20, 2020
@harrism
Copy link
Member

harrism commented Oct 20, 2020

Aside: please reserve [FEA] and [BUG], etc. for issues, not PRs.

@karthikeyann
Copy link
Contributor

@mythrocks resolve conflicts. (I tried locally, few tests failed)

@mythrocks
Copy link
Contributor Author

@mythrocks resolve conflicts. (I tried locally, few tests failed)

@karthikeyann, #6557 borked this PR. This will be harder to resolve piecemeal than to just nuke and redo. With your permission, I'll push -f the new version here.

@harrism
Copy link
Member

harrism commented Oct 27, 2020

There are not code review comments on this PR yet, so it won't mess up reviewers if you force push now. If you force push after reviews, the reviews become disassociated from the code (so you can't see the comments in the code changes view). If anyone has pulled your branch locally their local pull will be out of sync after you force push, so that's the second reason to be careful. But I think it's fine here if you can't resolve the conflicts by merging (usually you can).

@harrism
Copy link
Member

harrism commented Nov 23, 2020

@mythrocks should we push this to 0.18?

@mythrocks
Copy link
Contributor Author

@mythrocks should we push this to 0.18?

Sorry I missed this message, @harrism, @karthikeyann, et al. Yes, we should move this to 0.18.

@mythrocks mythrocks closed this Nov 30, 2020
@mythrocks mythrocks reopened this Nov 30, 2020
@mythrocks mythrocks changed the base branch from branch-0.17 to branch-0.18 December 4, 2020 05:08
@mythrocks mythrocks requested a review from a team as a code owner December 4, 2020 05:08
@mythrocks
Copy link
Contributor Author

I'm moving this back to WIP until I can sort out why the build is failing. For my own future reference, there are a couple of failures that need debugging:

  1. A fresh build of branch-0.18 seems to be showing the following failure, even without the changes in this PR:
[----------] 2 tests from RollingTestUdf
[ RUN      ] RollingTestUdf.StaticWindow
../tests/rolling/rolling_test.cpp:896: Failure
Expected: output = cudf::rolling_window(input, 2, 2, 4, cuda_udf_agg) doesn't throw an exception.
  Actual: it throws.
fish: "gtests/ROLLING_TEST" terminated by signal SIGSEGV (Address boundary error)
  1. mythrocks:refactor-rolling-cu seems to be failing build thus:
/usr/bin/g++ -DJITIFY_PRINT_LOG=0 -DSPDLOG_ACTIVE_LEVEL=SPDLOG_LEVEL_ -Dcudf_base_EXPORTS -I$SRC_DIR/cpp/build/googletest/install/include -I$SRC_DIR/cpp/build/_deps/thrust-src -I$SRC_DIR/cpp/build/_deps/jitify-src -I$SRC_DIR/cpp/build/_deps/libcudacxx-src/include -I/usr/local/cuda/targets/x86_64-linux/include -I$SRC_DIR/cpp/build/include -I$SRC_DIR/cpp/include -I$SRC_DIR/cpp/src -I$PREFIX/include -I$BUILD_PREFIX/include -Werror -Wno-error=deprecated-declarations -Wno-deprecated-declarations -DBOOST_NO_CXX14_CONSTEXPR -O3 -DNDEBUG -fPIC   -DJITIFY_USE_CACHE -DCUDF_VERSION=0.18.0 -std=gnu++14 -o CMakeFiles/cudf_base.dir/src/binaryop/jit/code/traits.cpp.o -c $SRC_DIR/cpp/src/binaryop/jit/code/traits.cpp
/opt/conda/envs/rapids/conda-bld/libcudf_1607534126881/work/cpp/src/binaryop/compiled/binary_ops.cu(316): error: no suitable conversion function from "rmm::cuda_stream_view" to "cudaStream_t" exists
          detected during instantiation of "std::unique_ptr<cudf::column, std::default_delete<cudf::column>> cudf::binops::compiled::<unnamed>::null_considering_binop::operator()(const LhsT &, const RhsT &, cudf::binary_operator, cudf::data_type, cudf::size_type, rmm::cuda_stream_view, rmm::mr::device_memory_resource *) const [with LhsT=cudf::scalar, RhsT=cudf::column_device_view]" 
(431): here

@mythrocks mythrocks added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Dec 10, 2020
@mythrocks
Copy link
Contributor Author

I've moved this to READY FOR REVIEW. To my untrained eye, the python test failures don't seem related to this change:

dask_cudf/tests/test_groupby.py::test_reset_index_multiindex FAILED      [ 42%]
dask_cudf/tests/test_groupby.py::test_groupby_split_out[c-1] FAILED      [ 42%]
dask_cudf/tests/test_groupby.py::test_groupby_split_out[c-2] FAILED      [ 42%]
dask_cudf/tests/test_groupby.py::test_groupby_split_out[c-3] FAILED      [ 42%]
dask_cudf/tests/test_groupby.py::test_groupby_split_out[d-1] FAILED      [ 42%]
dask_cudf/tests/test_groupby.py::test_groupby_split_out[d-2] FAILED      [ 42%]
dask_cudf/tests/test_groupby.py::test_groupby_split_out[d-3] FAILED      [ 42%]
dask_cudf/tests/test_groupby.py::test_groupby_split_out[e-1] FAILED      [ 42%]
dask_cudf/tests/test_groupby.py::test_groupby_split_out[e-2] FAILED      [ 43%]
dask_cudf/tests/test_groupby.py::test_groupby_split_out[e-3] FAILED      [ 43%]
dask_cudf/tests/test_groupby.py::test_groupby_split_out[column3-1] FAILED [ 43%]
dask_cudf/tests/test_groupby.py::test_groupby_split_out[column3-2] FAILED [ 43%]
dask_cudf/tests/test_groupby.py::test_groupby_split_out[column3-3] FAILED [ 43%]
dask_cudf/tests/test_groupby.py::test_groupby_split_out[column4-1] FAILED [ 43%]
dask_cudf/tests/test_groupby.py::test_groupby_split_out[column4-2] FAILED [ 43%]
dask_cudf/tests/test_groupby.py::test_groupby_split_out[column4-3] FAILED [ 44%]
dask_cudf/tests/test_groupby.py::test_groupby_split_out[column5-1] FAILED [ 44%]
dask_cudf/tests/test_groupby.py::test_groupby_split_out[column5-2] FAILED [ 44%]
dask_cudf/tests/test_groupby.py::test_groupby_split_out[column5-3] FAILED [ 44%]
dask_cudf/tests/test_groupby.py::test_groupby_dropna[a-False] FAILED     [ 44%]
dask_cudf/tests/test_groupby.py::test_groupby_dropna[a-True] FAILED      [ 44%]
dask_cudf/tests/test_groupby.py::test_groupby_dropna[a-None] FAILED      [ 44%]
dask_cudf/tests/test_groupby.py::test_groupby_dropna[b-False] FAILED     [ 44%]
dask_cudf/tests/test_groupby.py::test_groupby_dropna[b-True] FAILED      [ 45%]
dask_cudf/tests/test_groupby.py::test_groupby_dropna[b-None] FAILED      [ 45%]
dask_cudf/tests/test_groupby.py::test_groupby_dropna[c-False] FAILED     [ 45%]
dask_cudf/tests/test_groupby.py::test_groupby_dropna[c-True] FAILED      [ 45%]
dask_cudf/tests/test_groupby.py::test_groupby_dropna[c-None] FAILED      [ 45%]
dask_cudf/tests/test_groupby.py::test_groupby_dropna[d-False] FAILED     [ 45%]
dask_cudf/tests/test_groupby.py::test_groupby_dropna[d-True] FAILED      [ 45%]
dask_cudf/tests/test_groupby.py::test_groupby_dropna[d-None] FAILED      [ 46%]
dask_cudf/tests/test_groupby.py::test_groupby_dropna[by4-False] FAILED   [ 46%]
dask_cudf/tests/test_groupby.py::test_groupby_dropna[by4-True] FAILED    [ 46%]

@mythrocks mythrocks removed the request for review from a team December 11, 2020 19:23
@mythrocks
Copy link
Contributor Author

rerun tests

Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a minor formatting suggestion.
Requesting changes because auto-merger is relentless :)

cpp/src/rolling/grouped_rolling.cu Outdated Show resolved Hide resolved
cpp/src/rolling/grouped_rolling.cu Show resolved Hide resolved
@mythrocks
Copy link
Contributor Author

Ah, the checks pass, at last. :]

@vuule, I've also addressed the cudf::detail API concern you raised. (Thank you for pointing it out.)
@karthikeyann, could I bother you to take a look?

cpp/src/rolling/grouped_rolling.cu Outdated Show resolved Hide resolved
@kkraus14 kkraus14 removed the request for review from a team December 17, 2020 19:24
@kkraus14 kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge 6 - Okay to Auto-Merge and removed 3 - Ready for Review Ready for review by team labels Dec 17, 2020
@rapids-bot rapids-bot bot merged commit ce21296 into rapidsai:branch-0.18 Dec 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] rolling.cu takes too long to compile. Consider refactoring
7 participants