Author: | Wojciech Muła |
---|
Contents
See also the Daniel Lemire's article.
A "bitvecor" is an array of words; in the current implementation a word is
uint64_t
, however it might be any SIMD type.
Iteration over a bitvector means that for each 1 bit an action is performed; in particular, the action has to know the index of the bit in the bitvector.
Comparison of four methods:
- naive method --- test each bit of word, run a procedure if the bit is set;
- better method --- find the lowest bit set, determine its index and clear it;
- block-3 -- split a word into 3-bit subwords, for each subword unroll callback calls;
- block-4 -- as above, but subword size is 4 bits.
In all methods the number of callback call is the same and is equal to the number of bits set in a word. The methods differ in the number of iterations over a single word:
- naive method -- the position of most significant bit set;
- better method -- the number of bits set;
- block-3 -- 3 * index of last non-zero block;
- block-4 -- 4 * index of last non-zero block.
The better method is indeed faster in an average case. However, for specific bit patterns other methods perform better.
These patterns contain regular, long runs of 1, thanks to that a branch
predictor is able to help. For instance, if all words in a bitvector are
0x00000000ffffffff
(fill factor is 0.5), then the naive method is two
times faster than the better method. For bitvector having fill factor 0.5,
but with randomly set bits, the naive method is 4-5 times slower than better
one.
Naive method:
for (size_t i=0; i < n; i++) { uint64_t tmp = array[i]; size_t k = i * 64; while (tmp) { if (tmp & 0x1) { callback(k); } tmp >>= 1; k += 1; } }
Better method:
for (size_t i=0; i < n; i++) { uint64_t tmp = array[i]; size_t k = i * 64; while (tmp) { const uint64_t t = tmp & (~tmp + 1); const uint64_t r = __builtin_ctzll(t); callback(k + r); tmp ^= t; } }
Block-3 method:
for (size_t i=0; i < n; i++) { uint64_t tmp = array[i]; size_t k = i * 64; while (tmp) { switch (tmp & 0x7) { case 0: break; case 1: callback(k); break; // inline callback calls for rest values 2..7 } tmp >>= 3; k += 3; } }
Block-4 method:
for (size_t i=0; i < n; i++) { uint64_t tmp = array[i]; size_t k = i * 64; while (tmp) { switch (tmp & 0xf) { case 0: break; case 1: callback(k); break; // inline callback calls for rest values 2..15 } tmp >>= 4; k += 4; } }
Tests check different vectors sizes (given in bits) and various fill factors. The action which is performed for each bit is storing the index in an auxiliary table.
case | size [bits] | cardinality [bits] | fill factor | naive [us] | better [us] | block-3 [us] | block-4 [us] |
---|---|---|---|---|---|---|---|
0x0000000000000000 | 4096 | 0 | 0.00 | 215 | 270 | 269 | 269 |
16384 | 0 | 0.00 | 665 | 671 | 689 | 672 | |
65536 | 0 | 0.00 | 2,345 | 1,470 | 1,461 | 1,464 | |
262144 | 0 | 0.00 | 5,801 | 4,870 | 6,008 | 4,156 | |
524288 | 0 | 0.00 | 8,105 | 8,234 | 8,549 | 15,703 | |
0x000000000000ffff | 4096 | 1024 | 0.25 | 1,148 | 1,789 | 1,757 | 714 |
16384 | 4096 | 0.25 | 4,584 | 7,631 | 7,326 | 2,833 | |
65536 | 16384 | 0.25 | 18,783 | 29,242 | 28,600 | 11,312 | |
262144 | 65536 | 0.25 | 74,182 | 113,013 | 111,571 | 45,575 | |
524288 | 131072 | 0.25 | 147,918 | 226,059 | 222,398 | 90,141 | |
0x00000000ffffffff | 4096 | 2048 | 0.50 | 1,572 | 3,515 | 2,463 | 1,529 |
16384 | 8192 | 0.50 | 6,641 | 14,574 | 9,962 | 6,077 | |
65536 | 32768 | 0.50 | 25,973 | 57,529 | 40,165 | 25,226 | |
262144 | 131072 | 0.50 | 101,964 | 223,753 | 156,650 | 98,451 | |
524288 | 262144 | 0.50 | 201,800 | 444,297 | 312,356 | 200,524 | |
0x0000ffffffffffff | 4096 | 3072 | 0.75 | 3,170 | 5,135 | 2,074 | 2,364 |
16384 | 12288 | 0.75 | 12,682 | 21,592 | 8,304 | 8,794 | |
65536 | 49152 | 0.75 | 51,737 | 82,982 | 34,071 | 33,557 | |
262144 | 196608 | 0.75 | 210,327 | 329,007 | 134,023 | 132,390 | |
524288 | 393216 | 0.75 | 405,499 | 657,357 | 266,591 | 263,201 | |
0xffffffffffffffff | 4096 | 4096 | 1.00 | 4,162 | 6,929 | 3,694 | 2,716 |
16384 | 16384 | 1.00 | 17,536 | 28,499 | 15,092 | 10,901 | |
65536 | 65536 | 1.00 | 67,778 | 111,432 | 57,929 | 44,643 | |
262144 | 262144 | 1.00 | 267,967 | 439,251 | 228,867 | 175,155 | |
524288 | 524288 | 1.00 | 536,125 | 879,326 | 458,143 | 350,696 | |
random 0.05 | 4096 | 204 | 0.05 | 5,660 | 931 | 4,758 | 4,518 |
16384 | 819 | 0.05 | 22,486 | 3,826 | 19,521 | 19,146 | |
65536 | 3276 | 0.05 | 89,582 | 14,670 | 79,614 | 74,084 | |
262144 | 13107 | 0.05 | 353,570 | 59,337 | 313,467 | 295,045 | |
524288 | 26214 | 0.05 | 704,664 | 118,168 | 624,676 | 590,066 | |
random 0.25 | 4096 | 1024 | 0.25 | 12,612 | 2,270 | 10,257 | 9,077 |
16384 | 4096 | 0.25 | 50,829 | 9,397 | 42,042 | 34,311 | |
65536 | 16384 | 0.25 | 196,004 | 38,865 | 163,905 | 135,968 | |
262144 | 65536 | 0.25 | 804,797 | 151,808 | 652,206 | 544,368 | |
524288 | 131072 | 0.25 | 1,641,651 | 300,710 | 1,304,817 | 1,087,104 | |
random 0.50 | 4096 | 2048 | 0.50 | 15,892 | 4,019 | 12,114 | 10,083 |
16384 | 8192 | 0.50 | 63,993 | 16,200 | 47,617 | 40,307 | |
65536 | 32768 | 0.50 | 252,484 | 65,758 | 195,395 | 162,699 | |
262144 | 131072 | 0.50 | 1,076,578 | 262,869 | 760,944 | 635,493 | |
524288 | 262144 | 0.50 | 2,013,693 | 510,859 | 1,517,829 | 1,269,972 | |
random 0.75 | 4096 | 3072 | 0.75 | 11,152 | 5,693 | 11,073 | 9,405 |
16384 | 12288 | 0.75 | 45,268 | 23,250 | 44,959 | 37,629 | |
65536 | 49152 | 0.75 | 178,443 | 92,000 | 178,652 | 151,820 | |
262144 | 196608 | 0.75 | 713,593 | 366,630 | 712,900 | 606,128 | |
524288 | 393216 | 0.75 | 1,425,944 | 731,703 | 1,423,945 | 1,211,107 | |
random 0.95 | 4096 | 3891 | 0.95 | 5,690 | 7,265 | 6,505 | 5,467 |
16384 | 15564 | 0.95 | 23,855 | 29,313 | 27,543 | 22,924 | |
65536 | 62259 | 0.95 | 93,923 | 113,647 | 107,702 | 92,362 | |
262144 | 249036 | 0.95 | 371,728 | 448,525 | 427,551 | 367,779 | |
524288 | 498073 | 0.95 | 743,468 | 899,600 | 852,838 | 735,956 |
case | size [bits] | cardinality [bits] | fill factor | naive [x] | better [x] | block-3 [x] | block-4 [x] |
---|---|---|---|---|---|---|---|
0x0000000000000000 | 4096 | 0 | 0.00 | 1.00 | 0.80 | 0.80 | 0.80 |
16384 | 0 | 0.00 | 1.00 | 0.99 | 0.97 | 0.99 | |
65536 | 0 | 0.00 | 1.00 | 1.60 | 1.61 | 1.60 | |
262144 | 0 | 0.00 | 1.00 | 1.19 | 0.97 | 1.40 | |
524288 | 0 | 0.00 | 1.00 | 0.98 | 0.95 | 0.52 | |
0x000000000000ffff | 4096 | 1024 | 0.25 | 1.00 | 0.64 | 0.65 | 1.61 |
16384 | 4096 | 0.25 | 1.00 | 0.60 | 0.63 | 1.62 | |
65536 | 16384 | 0.25 | 1.00 | 0.64 | 0.66 | 1.66 | |
262144 | 65536 | 0.25 | 1.00 | 0.66 | 0.66 | 1.63 | |
524288 | 131072 | 0.25 | 1.00 | 0.65 | 0.67 | 1.64 | |
0x00000000ffffffff | 4096 | 2048 | 0.50 | 1.00 | 0.45 | 0.64 | 1.03 |
16384 | 8192 | 0.50 | 1.00 | 0.46 | 0.67 | 1.09 | |
65536 | 32768 | 0.50 | 1.00 | 0.45 | 0.65 | 1.03 | |
262144 | 131072 | 0.50 | 1.00 | 0.46 | 0.65 | 1.04 | |
524288 | 262144 | 0.50 | 1.00 | 0.45 | 0.65 | 1.01 | |
0x0000ffffffffffff | 4096 | 3072 | 0.75 | 1.00 | 0.62 | 1.53 | 1.34 |
16384 | 12288 | 0.75 | 1.00 | 0.59 | 1.53 | 1.44 | |
65536 | 49152 | 0.75 | 1.00 | 0.62 | 1.52 | 1.54 | |
262144 | 196608 | 0.75 | 1.00 | 0.64 | 1.57 | 1.59 | |
524288 | 393216 | 0.75 | 1.00 | 0.62 | 1.52 | 1.54 | |
0xffffffffffffffff | 4096 | 4096 | 1.00 | 1.00 | 0.60 | 1.13 | 1.53 |
16384 | 16384 | 1.00 | 1.00 | 0.62 | 1.16 | 1.61 | |
65536 | 65536 | 1.00 | 1.00 | 0.61 | 1.17 | 1.52 | |
262144 | 262144 | 1.00 | 1.00 | 0.61 | 1.17 | 1.53 | |
524288 | 524288 | 1.00 | 1.00 | 0.61 | 1.17 | 1.53 | |
random 0.05 | 4096 | 204 | 0.05 | 1.00 | 6.08 | 1.19 | 1.25 |
16384 | 819 | 0.05 | 1.00 | 5.88 | 1.15 | 1.17 | |
65536 | 3276 | 0.05 | 1.00 | 6.11 | 1.13 | 1.21 | |
262144 | 13107 | 0.05 | 1.00 | 5.96 | 1.13 | 1.20 | |
524288 | 26214 | 0.05 | 1.00 | 5.96 | 1.13 | 1.19 | |
random 0.25 | 4096 | 1024 | 0.25 | 1.00 | 5.56 | 1.23 | 1.39 |
16384 | 4096 | 0.25 | 1.00 | 5.41 | 1.21 | 1.48 | |
65536 | 16384 | 0.25 | 1.00 | 5.04 | 1.20 | 1.44 | |
262144 | 65536 | 0.25 | 1.00 | 5.30 | 1.23 | 1.48 | |
524288 | 131072 | 0.25 | 1.00 | 5.46 | 1.26 | 1.51 | |
random 0.50 | 4096 | 2048 | 0.50 | 1.00 | 3.95 | 1.31 | 1.58 |
16384 | 8192 | 0.50 | 1.00 | 3.95 | 1.34 | 1.59 | |
65536 | 32768 | 0.50 | 1.00 | 3.84 | 1.29 | 1.55 | |
262144 | 131072 | 0.50 | 1.00 | 4.10 | 1.41 | 1.69 | |
524288 | 262144 | 0.50 | 1.00 | 3.94 | 1.33 | 1.59 | |
random 0.75 | 4096 | 3072 | 0.75 | 1.00 | 1.96 | 1.01 | 1.19 |
16384 | 12288 | 0.75 | 1.00 | 1.95 | 1.01 | 1.20 | |
65536 | 49152 | 0.75 | 1.00 | 1.94 | 1.00 | 1.18 | |
262144 | 196608 | 0.75 | 1.00 | 1.95 | 1.00 | 1.18 | |
524288 | 393216 | 0.75 | 1.00 | 1.95 | 1.00 | 1.18 | |
random 0.95 | 4096 | 3891 | 0.95 | 1.00 | 0.78 | 0.87 | 1.04 |
16384 | 15564 | 0.95 | 1.00 | 0.81 | 0.87 | 1.04 | |
65536 | 62259 | 0.95 | 1.00 | 0.83 | 0.87 | 1.02 | |
262144 | 249036 | 0.95 | 1.00 | 0.83 | 0.87 | 1.01 | |
524288 | 498073 | 0.95 | 1.00 | 0.83 | 0.87 | 1.01 |
case | size [bits] | cardinality [bits] | fill factor | naive [us] | better [us] | block-3 [us] | block-4 [us] |
---|---|---|---|---|---|---|---|
0x0000000000000000 | 4096 | 0 | 0.00 | 50 | 49 | 45 | 45 |
16384 | 0 | 0.00 | 163 | 163 | 157 | 158 | |
65536 | 0 | 0.00 | 615 | 615 | 618 | 610 | |
262144 | 0 | 0.00 | 2,427 | 2,429 | 2,430 | 2,426 | |
524288 | 0 | 0.00 | 4,847 | 4,852 | 4,842 | 4,841 | |
0x000000000000ffff | 4096 | 1024 | 0.25 | 529 | 1,034 | 712 | 420 |
16384 | 4096 | 0.25 | 2,095 | 4,117 | 2,824 | 1,666 | |
65536 | 16384 | 0.25 | 8,349 | 16,464 | 11,278 | 6,649 | |
262144 | 65536 | 0.25 | 33,137 | 65,842 | 45,105 | 26,586 | |
524288 | 131072 | 0.25 | 66,152 | 131,648 | 90,213 | 53,170 | |
0x00000000ffffffff | 4096 | 2048 | 0.50 | 1,454 | 2,707 | 1,232 | 799 |
16384 | 8192 | 0.50 | 5,793 | 10,799 | 4,913 | 3,177 | |
65536 | 32768 | 0.50 | 23,120 | 43,145 | 19,642 | 12,691 | |
262144 | 131072 | 0.50 | 92,488 | 172,550 | 78,532 | 50,753 | |
524288 | 262144 | 0.50 | 185,023 | 344,964 | 157,053 | 101,495 | |
0x0000ffffffffffff | 4096 | 3072 | 0.75 | 2,033 | 3,839 | 2,044 | 1,176 |
16384 | 12288 | 0.75 | 8,099 | 15,341 | 8,162 | 4,686 | |
65536 | 49152 | 0.75 | 32,355 | 61,320 | 32,639 | 18,730 | |
262144 | 196608 | 0.75 | 129,396 | 245,177 | 130,578 | 74,918 | |
524288 | 393216 | 0.75 | 258,800 | 490,388 | 261,125 | 149,813 | |
0xffffffffffffffff | 4096 | 4096 | 1.00 | 2,594 | 4,978 | 2,523 | 1,856 |
16384 | 16384 | 1.00 | 10,350 | 19,895 | 10,078 | 7,405 | |
65536 | 65536 | 1.00 | 41,367 | 79,537 | 40,289 | 29,606 | |
262144 | 262144 | 1.00 | 165,384 | 318,046 | 161,135 | 118,412 | |
524288 | 524288 | 1.00 | 330,921 | 636,071 | 322,161 | 236,796 | |
random 0.05 | 4096 | 204 | 0.05 | 3,539 | 198 | 1,946 | 1,782 |
16384 | 819 | 0.05 | 14,531 | 802 | 10,125 | 10,592 | |
65536 | 3276 | 0.05 | 58,185 | 3,804 | 44,797 | 45,924 | |
262144 | 13107 | 0.05 | 236,529 | 36,876 | 191,849 | 197,751 | |
524288 | 26214 | 0.05 | 474,688 | 81,180 | 387,854 | 397,354 | |
random 0.25 | 4096 | 1024 | 0.25 | 6,567 | 1,069 | 6,550 | 4,432 |
16384 | 4096 | 0.25 | 40,152 | 5,065 | 32,714 | 27,052 | |
65536 | 16384 | 0.25 | 171,564 | 23,716 | 135,537 | 113,170 | |
262144 | 65536 | 0.25 | 687,572 | 103,065 | 554,231 | 468,943 | |
524288 | 131072 | 0.25 | 1,375,448 | 207,411 | 1,106,585 | 938,587 | |
random 0.50 | 4096 | 2048 | 0.50 | 10,866 | 2,730 | 7,967 | 5,130 |
16384 | 8192 | 0.50 | 54,893 | 10,936 | 39,080 | 30,518 | |
65536 | 32768 | 0.50 | 225,728 | 43,618 | 161,253 | 127,082 | |
262144 | 131072 | 0.50 | 901,437 | 174,320 | 657,454 | 526,084 | |
524288 | 262144 | 0.50 | 1,804,751 | 348,792 | 1,314,465 | 1,052,250 | |
random 0.75 | 4096 | 3072 | 0.75 | 6,294 | 3,873 | 7,559 | 4,586 |
16384 | 12288 | 0.75 | 41,584 | 15,479 | 35,960 | 28,457 | |
65536 | 49152 | 0.75 | 172,901 | 61,900 | 147,458 | 118,824 | |
262144 | 196608 | 0.75 | 692,473 | 247,568 | 603,821 | 496,718 | |
524288 | 393216 | 0.75 | 1,388,437 | 495,067 | 1,208,279 | 994,520 | |
random 0.95 | 4096 | 3891 | 0.95 | 3,110 | 4,748 | 3,751 | 2,529 |
16384 | 15564 | 0.95 | 15,982 | 18,973 | 16,981 | 12,763 | |
65536 | 62259 | 0.95 | 67,929 | 75,827 | 70,270 | 55,455 | |
262144 | 249036 | 0.95 | 273,796 | 303,289 | 281,986 | 233,859 | |
524288 | 498073 | 0.95 | 546,779 | 606,527 | 565,346 | 469,310 |
case | size [bits] | cardinality [bits] | fill factor | naive [x] | better [x] | block-3 [x] | block-4 [x] |
---|---|---|---|---|---|---|---|
0x0000000000000000 | 4096 | 0 | 0.00 | 1.00 | 1.02 | 1.11 | 1.11 |
16384 | 0 | 0.00 | 1.00 | 1.00 | 1.04 | 1.03 | |
65536 | 0 | 0.00 | 1.00 | 1.00 | 1.00 | 1.01 | |
262144 | 0 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | |
524288 | 0 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | |
0x000000000000ffff | 4096 | 1024 | 0.25 | 1.00 | 0.51 | 0.74 | 1.26 |
16384 | 4096 | 0.25 | 1.00 | 0.51 | 0.74 | 1.26 | |
65536 | 16384 | 0.25 | 1.00 | 0.51 | 0.74 | 1.26 | |
262144 | 65536 | 0.25 | 1.00 | 0.50 | 0.73 | 1.25 | |
524288 | 131072 | 0.25 | 1.00 | 0.50 | 0.73 | 1.24 | |
0x00000000ffffffff | 4096 | 2048 | 0.50 | 1.00 | 0.54 | 1.18 | 1.82 |
16384 | 8192 | 0.50 | 1.00 | 0.54 | 1.18 | 1.82 | |
65536 | 32768 | 0.50 | 1.00 | 0.54 | 1.18 | 1.82 | |
262144 | 131072 | 0.50 | 1.00 | 0.54 | 1.18 | 1.82 | |
524288 | 262144 | 0.50 | 1.00 | 0.54 | 1.18 | 1.82 | |
0x0000ffffffffffff | 4096 | 3072 | 0.75 | 1.00 | 0.53 | 0.99 | 1.73 |
16384 | 12288 | 0.75 | 1.00 | 0.53 | 0.99 | 1.73 | |
65536 | 49152 | 0.75 | 1.00 | 0.53 | 0.99 | 1.73 | |
262144 | 196608 | 0.75 | 1.00 | 0.53 | 0.99 | 1.73 | |
524288 | 393216 | 0.75 | 1.00 | 0.53 | 0.99 | 1.73 | |
0xffffffffffffffff | 4096 | 4096 | 1.00 | 1.00 | 0.52 | 1.03 | 1.40 |
16384 | 16384 | 1.00 | 1.00 | 0.52 | 1.03 | 1.40 | |
65536 | 65536 | 1.00 | 1.00 | 0.52 | 1.03 | 1.40 | |
262144 | 262144 | 1.00 | 1.00 | 0.52 | 1.03 | 1.40 | |
524288 | 524288 | 1.00 | 1.00 | 0.52 | 1.03 | 1.40 | |
random 0.05 | 4096 | 204 | 0.05 | 1.00 | 17.87 | 1.82 | 1.99 |
16384 | 819 | 0.05 | 1.00 | 18.12 | 1.44 | 1.37 | |
65536 | 3276 | 0.05 | 1.00 | 15.30 | 1.30 | 1.27 | |
262144 | 13107 | 0.05 | 1.00 | 6.41 | 1.23 | 1.20 | |
524288 | 26214 | 0.05 | 1.00 | 5.85 | 1.22 | 1.19 | |
random 0.25 | 4096 | 1024 | 0.25 | 1.00 | 6.14 | 1.00 | 1.48 |
16384 | 4096 | 0.25 | 1.00 | 7.93 | 1.23 | 1.48 | |
65536 | 16384 | 0.25 | 1.00 | 7.23 | 1.27 | 1.52 | |
262144 | 65536 | 0.25 | 1.00 | 6.67 | 1.24 | 1.47 | |
524288 | 131072 | 0.25 | 1.00 | 6.63 | 1.24 | 1.47 | |
random 0.50 | 4096 | 2048 | 0.50 | 1.00 | 3.98 | 1.36 | 2.12 |
16384 | 8192 | 0.50 | 1.00 | 5.02 | 1.40 | 1.80 | |
65536 | 32768 | 0.50 | 1.00 | 5.18 | 1.40 | 1.78 | |
262144 | 131072 | 0.50 | 1.00 | 5.17 | 1.37 | 1.71 | |
524288 | 262144 | 0.50 | 1.00 | 5.17 | 1.37 | 1.72 | |
random 0.75 | 4096 | 3072 | 0.75 | 1.00 | 1.63 | 0.83 | 1.37 |
16384 | 12288 | 0.75 | 1.00 | 2.69 | 1.16 | 1.46 | |
65536 | 49152 | 0.75 | 1.00 | 2.79 | 1.17 | 1.46 | |
262144 | 196608 | 0.75 | 1.00 | 2.80 | 1.15 | 1.39 | |
524288 | 393216 | 0.75 | 1.00 | 2.80 | 1.15 | 1.40 | |
random 0.95 | 4096 | 3891 | 0.95 | 1.00 | 0.66 | 0.83 | 1.23 |
16384 | 15564 | 0.95 | 1.00 | 0.84 | 0.94 | 1.25 | |
65536 | 62259 | 0.95 | 1.00 | 0.90 | 0.97 | 1.22 | |
262144 | 249036 | 0.95 | 1.00 | 0.90 | 0.97 | 1.17 | |
524288 | 498073 | 0.95 | 1.00 | 0.90 | 0.97 | 1.17 |