Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

fix > intmax num inputs for scan_by_key #1424

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions thrust/system/cuda/detail/scan_by_key.h
Original file line number Diff line number Diff line change
Expand Up @@ -513,7 +513,7 @@ namespace __scan_by_key {
scan_op(scan_op_)
{
int tile_idx = blockIdx.x;
Size tile_base = ITEMS_PER_TILE * tile_idx;
Size tile_base = ITEMS_PER_TILE * static_cast<Size>(tile_idx);
Size num_remaining = num_items - tile_base;

if (num_remaining > ITEMS_PER_TILE)
Expand Down Expand Up @@ -734,7 +734,7 @@ namespace __scan_by_key {
ScanOp scan_op,
AddInitToScan add_init_to_scan)
{
int num_items = static_cast<int>(thrust::distance(keys_first, keys_last));
size_t num_items = static_cast<size_t>(thrust::distance(keys_first, keys_last));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will likely introduce performance regressions -- using size_t unconditionally here will instantiate the scan_by_key implementation with Size=size_t, increasing register pressure and generating less efficient code for inputs that can be indexed by int.

Take a look at the macros in thrust/system/cuda/detail/dispatch.h -- these will conditionally switch between using int or size_t depending on the actual runtime value.

size_t storage_size = 0;
cudaStream_t stream = cuda_cub::stream(policy);
bool debug_sync = THRUST_DEBUG_SYNC_FLAG;
Expand Down