Performance of sequential vs OpenMP-based element-by-element vector multiplication.
This experiment was for comparing the performance between:
- Find
x*y
using a single thread (sequential). - Find
x*y
accelerated using OpenMP.
Here x
, y
are both floating-point vectors. Both approaches were attempted on
a number of vector sizes, running each approach 5 times per size to get a good
time measure. Note that neither approach makes use of SIMD instructions which
are available on all modern hardware. While it might seem that OpenMP method
would be a clear winner, the results indicate it is not the case. This is
possibly because of high communication costs, and not enough computational
workload as indicated by this answer. However, from 10⁸ elements, OpenMP
approach performs better than sequential.
All outputs are saved in gist and a small part of the output is listed here. Some charts are also included below, generated from sheets.
$ g++ -O3 -fopenmp main.cxx
$ ./a.out
# [00000.164 ms; 1e+06 elems.] [1.644725] multiplySeq
# [00000.291 ms; 1e+06 elems.] [1.644725] multiplyOpenmp
# [00002.108 ms; 1e+07 elems.] [1.644725] multiplySeq
# [00002.654 ms; 1e+07 elems.] [1.644725] multiplyOpenmp
# [00046.979 ms; 1e+08 elems.] [1.644725] multiplySeq
# [00040.584 ms; 1e+08 elems.] [1.644725] multiplyOpenmp
# [00398.860 ms; 1e+09 elems.] [1.644725] multiplySeq
# [00295.718 ms; 1e+09 elems.] [1.644725] multiplyOpenmp