Replies: 1 comment 1 reply
-
The scalability over number of threads depends on the network and the number of samples. The time per sample vary a lot. When number of samples is small (comparable to number of threads) a particular thread might endup with 2-3 slow samples thus limiting the scalability. Small network are too fast such as most of the time is spent in the threading lock/unlock. Overall for a network < 1000, 4 and 8 threads provide the best timing. Increasing number of threads could slow down (it is a combination of lock/unlock and unlucky thread that got the slowest samples). |
Beta Was this translation helpful? Give feedback.
-
I have the following code for contraction optimization using multiple threads. I set the number of samples to 64. What I observe is that if I set the number of threads to 1, the optimization takes 219 seconds. If I set the threads to 64, it takes 89 seconds. The resulting quality is not very different. I expect a much faster time to solution with 64 threads, and I can see that the CPU utilization indeed goes above 6000% for a substantial amount of time.
The machine has a single AMD Zen 3 (Milan) 32 core 64 thread CPU, and an A100 GPU. Even if I set the number of threads to 32, the CPU utilization is above 3100%, and the time for 64 samples is 65 seconds with similar performance. For 8 threads, the time is 47 seconds.
Versions:
Code:
Beta Was this translation helpful? Give feedback.
All reactions