Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend the Parquet writer's dictionary encoding benchmark. #16591

Merged
merged 14 commits into from
Sep 10, 2024

Conversation

mhaseeb123
Copy link
Member

@mhaseeb123 mhaseeb123 commented Aug 17, 2024

Description

This PR extends the data cardinality and run length range for the existing parquet writer's encoding benchmark.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Aug 17, 2024
@mhaseeb123 mhaseeb123 added 2 - In Progress Currently a work in progress non-breaking Non-breaking change cuIO cuIO issue improvement Improvement / enhancement to an existing function labels Aug 17, 2024
Copy link

copy-pr-bot bot commented Aug 19, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@@ -202,8 +202,8 @@ NVBENCH_BENCH_TYPES(BM_parq_write_encode, NVBENCH_TYPE_AXES(d_type_list))
.set_name("parquet_write_encode")
.set_type_axes_names({"data_type"})
.set_min_samples(4)
.add_int64_axis("cardinality", {0, 1000})
.add_int64_axis("run_length", {1, 32});
.add_int64_axis("cardinality", {1, 1000, 10'000, 100'000, 1'000'000})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any resource/infrastructure constraints that we need to consider here? We can perhaps remove one or two entries from here if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can drop 1. Given the table size, it's effectively the same as 1000.
Also, should 1M just be 0? Not sure if the goal is to have unique elements (AFAIK row count is typically lower than 1M).

@mhaseeb123 mhaseeb123 marked this pull request as ready for review August 19, 2024 21:41
@mhaseeb123 mhaseeb123 requested a review from a team as a code owner August 19, 2024 21:41
@mhaseeb123
Copy link
Member Author

/ok to test

@mhaseeb123 mhaseeb123 changed the title Add a benchmark for Parquet writer's dictionary encoding. Extend the Parquet writer's dictionary encoding benchmark. Aug 19, 2024
@mhaseeb123 mhaseeb123 self-assigned this Aug 19, 2024
@mhaseeb123 mhaseeb123 added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress cuIO cuIO issue labels Aug 19, 2024
@mhaseeb123 mhaseeb123 changed the title Extend the Parquet writer's dictionary encoding benchmark. [Minor] Extend the Parquet writer's dictionary encoding benchmark. Aug 20, 2024
@vuule
Copy link
Contributor

vuule commented Aug 20, 2024

what's the reason for this change?

@mhaseeb123
Copy link
Member Author

mhaseeb123 commented Aug 20, 2024

what's the reason for this change?

First of all welcome back. Greg wanted me to push any updates I did to the benchmark for #16541. Though I think that all my local changes (even wider extended ranges) need not to be pushed upstream if not needed.

rapids-bot bot pushed a commit that referenced this pull request Aug 30, 2024
…::static_map` (#16541)

Part of #12261. This PR refactors the dictionary encoding in Parquet writers to migrate from `cuco::legacy::static_map` to `cuco::static_map` to build the dictionaries.

### Performance Results
The changes result in +0.08% average speed improvement and +16.22% average memory footprint increase (stems from the adjusted sizes by `cuco::make_window_extent` due to [prime gap](https://en.wikipedia.org/wiki/Prime_gap)) across the benchmark cases extended from #16591 

Currently, we do see a roughly 8% speed improvement in map insert and find kernels which is counteracted by the map init and map collect kernels as they have to process 16.22% more slots. With a cuco bump, the average speed improvement will increase from 0.08% to +3% and the memory footprint change will go back from 16.22% to +0%.

### Hardware used for benchmarking
```
 `NVIDIA RTX 5880 Ada Generation`
* SM Version: 890 (PTX Version: 860)
* Number of SMs: 110
* SM Default Clock Rate: 18446744071874 MHz
* Global Memory: 23879 MiB Free / 48632 MiB Total
* Global Memory Bus Peak: 960 GB/sec (384-bit DDR @10001MHz)
* Max Shared Memory: 100 KiB/SM, 48 KiB/Block
* L2 Cache Size: 98304 KiB
* Maximum Active Blocks: 24/SM
* Maximum Active Threads: 1536/SM, 1024/Block
* Available Registers: 65536/SM, 65536/Block
* ECC Enabled: No
```

Authors:
  - Muhammad Haseeb (https://github.com/mhaseeb123)

Approvers:
  - Yunsong Wang (https://github.com/PointKernel)
  - David Wendt (https://github.com/davidwendt)

URL: #16541
@mhaseeb123
Copy link
Member Author

/ok to test

@mhaseeb123
Copy link
Member Author

/ok to test

.add_int64_axis("cardinality", {0, 1000})
.add_int64_axis("run_length", {1, 32});
.add_int64_axis("cardinality", {0, 1000, 10'000, 100'000})
.add_int64_axis("run_length", {1, 32, 64});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missed this the first time around. To me, 32 is already a very high run length. I think we should instead add 4 or 8. I'm fine with 64 if you do see a significant difference in file size and/or performance compared to 32.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No significant differences with 64 for the benchmarks we ran either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for 4 or 8, then

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All done!

@mhaseeb123
Copy link
Member Author

/ok to test

@mhaseeb123 mhaseeb123 added 4 - Needs Review Waiting for reviewer to review or respond and removed 3 - Ready for Review Ready for review by team labels Sep 9, 2024
@mhaseeb123 mhaseeb123 changed the title [Minor] Extend the Parquet writer's dictionary encoding benchmark. Extend the Parquet writer's dictionary encoding benchmark. Sep 9, 2024
Copy link
Contributor

@karthikeyann karthikeyann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to know how much time this increases in benchmark runs.
If it is not available now, follow up with Randy on benchmark runs.

@mhaseeb123
Copy link
Member Author

/merge

@mhaseeb123
Copy link
Member Author

mhaseeb123 commented Sep 9, 2024

Would be nice to know how much time this increases in benchmark runs. If it is not available now, follow up with Randy on benchmark runs.

Results in #16541 (here) for which we are extending this. Each new benchmark in the matrix takes roughly 0.5s to run on my workstation (AMD Threadripper + RTX Ada 5880) so it should be roughly an 4s increase in total time (8x new benchmarks).

@mhaseeb123 mhaseeb123 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 4 - Needs Review Waiting for reviewer to review or respond labels Sep 9, 2024
@rapids-bot rapids-bot bot merged commit f21979e into rapidsai:branch-24.10 Sep 10, 2024
104 checks passed
@mhaseeb123 mhaseeb123 deleted the pq-writer-dict-benchmark branch September 10, 2024 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Status: Landed
Development

Successfully merging this pull request may close these issues.

3 participants