Skip to content

How many blocks or threads per block are used by the cuQuantum API? #20

Answered by leofang
haiyongsong1921 asked this question in Q&A
Discussion options

You must be logged in to vote

As in any CUDA program (and in particular CUDA Libraries) the grid/block/shmem sizes are impacted by many many factors, e.g. algorithm, implementation, hardware, driver, ... Plus, a single API call might have multiple kernels invoked, so there's no way to answer this question. Finally, even if you have this information, I don't think there's much you can do with it.

If you're interested, one way to check is to run your workload under nsys (part of the Nsight Profiler), and then open the generated file in the Nsight visualizer. In the GPU timeline you can inspect the kernel configurations.

Replies: 1 comment 6 replies

Comment options

You must be logged in to vote
6 replies
@leofang
Comment options

@haiyongsong1921
Comment options

@leofang
Comment options

@haiyongsong1921
Comment options

@leofang
Comment options

Answer selected by leofang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants