Skip to content

Latest commit

 

History

History

iterater-over-bits

Iterate over bits of a large bit stream

Author: Wojciech Muła

See also the Daniel Lemire's article.

A "bitvecor" is an array of words; in the current implementation a word is uint64_t, however it might be any SIMD type.

Iteration over a bitvector means that for each 1 bit an action is performed; in particular, the action has to know the index of the bit in the bitvector.

Comparison of four methods:

  1. naive method --- test each bit of word, run a procedure if the bit is set;
  2. better method --- find the lowest bit set, determine its index and clear it;
  3. block-3 -- split a word into 3-bit subwords, for each subword unroll callback calls;
  4. block-4 -- as above, but subword size is 4 bits.

In all methods the number of callback call is the same and is equal to the number of bits set in a word. The methods differ in the number of iterations over a single word:

  1. naive method -- the position of most significant bit set;
  2. better method -- the number of bits set;
  3. block-3 -- 3 * index of last non-zero block;
  4. block-4 -- 4 * index of last non-zero block.

The better method is indeed faster in an average case. However, for specific bit patterns other methods perform better.

These patterns contain regular, long runs of 1, thanks to that a branch predictor is able to help. For instance, if all words in a bitvector are 0x00000000ffffffff (fill factor is 0.5), then the naive method is two times faster than the better method. For bitvector having fill factor 0.5, but with randomly set bits, the naive method is 4-5 times slower than better one.

Naive method:

for (size_t i=0; i < n; i++) {
    uint64_t tmp = array[i];

    size_t k = i * 64;
    while (tmp) {
        if (tmp & 0x1) {
            callback(k);
        }

        tmp >>= 1;
        k += 1;
    }
}

Better method:

for (size_t i=0; i < n; i++) {
    uint64_t tmp = array[i];

    size_t k = i * 64;
    while (tmp) {
        const uint64_t t = tmp & (~tmp + 1);
        const uint64_t r = __builtin_ctzll(t);
        callback(k + r);
        tmp ^= t;
    }
}

Block-3 method:

for (size_t i=0; i < n; i++) {
    uint64_t tmp = array[i];

    size_t k = i * 64;
    while (tmp) {

        switch (tmp & 0x7) {
            case 0:
                break;

            case 1:
                callback(k);
                break;

            // inline callback calls for rest values 2..7
        }

        tmp >>= 3;
        k += 3;
    }
}

Block-4 method:

for (size_t i=0; i < n; i++) {
    uint64_t tmp = array[i];

    size_t k = i * 64;
    while (tmp) {

        switch (tmp & 0xf) {
            case 0:
                break;

            case 1:
                callback(k);
                break;

            // inline callback calls for rest values 2..15
        }

        tmp >>= 4;
        k += 4;
    }
}

Tests check different vectors sizes (given in bits) and various fill factors. The action which is performed for each bit is storing the index in an auxiliary table.

case size [bits] cardinality [bits] fill factor naive [us] better [us] block-3 [us] block-4 [us]
0x0000000000000000 4096 0 0.00 215 270 269 269
  16384 0 0.00 665 671 689 672
  65536 0 0.00 2,345 1,470 1,461 1,464
  262144 0 0.00 5,801 4,870 6,008 4,156
  524288 0 0.00 8,105 8,234 8,549 15,703
0x000000000000ffff 4096 1024 0.25 1,148 1,789 1,757 714
  16384 4096 0.25 4,584 7,631 7,326 2,833
  65536 16384 0.25 18,783 29,242 28,600 11,312
  262144 65536 0.25 74,182 113,013 111,571 45,575
  524288 131072 0.25 147,918 226,059 222,398 90,141
0x00000000ffffffff 4096 2048 0.50 1,572 3,515 2,463 1,529
  16384 8192 0.50 6,641 14,574 9,962 6,077
  65536 32768 0.50 25,973 57,529 40,165 25,226
  262144 131072 0.50 101,964 223,753 156,650 98,451
  524288 262144 0.50 201,800 444,297 312,356 200,524
0x0000ffffffffffff 4096 3072 0.75 3,170 5,135 2,074 2,364
  16384 12288 0.75 12,682 21,592 8,304 8,794
  65536 49152 0.75 51,737 82,982 34,071 33,557
  262144 196608 0.75 210,327 329,007 134,023 132,390
  524288 393216 0.75 405,499 657,357 266,591 263,201
0xffffffffffffffff 4096 4096 1.00 4,162 6,929 3,694 2,716
  16384 16384 1.00 17,536 28,499 15,092 10,901
  65536 65536 1.00 67,778 111,432 57,929 44,643
  262144 262144 1.00 267,967 439,251 228,867 175,155
  524288 524288 1.00 536,125 879,326 458,143 350,696
random 0.05 4096 204 0.05 5,660 931 4,758 4,518
  16384 819 0.05 22,486 3,826 19,521 19,146
  65536 3276 0.05 89,582 14,670 79,614 74,084
  262144 13107 0.05 353,570 59,337 313,467 295,045
  524288 26214 0.05 704,664 118,168 624,676 590,066
random 0.25 4096 1024 0.25 12,612 2,270 10,257 9,077
  16384 4096 0.25 50,829 9,397 42,042 34,311
  65536 16384 0.25 196,004 38,865 163,905 135,968
  262144 65536 0.25 804,797 151,808 652,206 544,368
  524288 131072 0.25 1,641,651 300,710 1,304,817 1,087,104
random 0.50 4096 2048 0.50 15,892 4,019 12,114 10,083
  16384 8192 0.50 63,993 16,200 47,617 40,307
  65536 32768 0.50 252,484 65,758 195,395 162,699
  262144 131072 0.50 1,076,578 262,869 760,944 635,493
  524288 262144 0.50 2,013,693 510,859 1,517,829 1,269,972
random 0.75 4096 3072 0.75 11,152 5,693 11,073 9,405
  16384 12288 0.75 45,268 23,250 44,959 37,629
  65536 49152 0.75 178,443 92,000 178,652 151,820
  262144 196608 0.75 713,593 366,630 712,900 606,128
  524288 393216 0.75 1,425,944 731,703 1,423,945 1,211,107
random 0.95 4096 3891 0.95 5,690 7,265 6,505 5,467
  16384 15564 0.95 23,855 29,313 27,543 22,924
  65536 62259 0.95 93,923 113,647 107,702 92,362
  262144 249036 0.95 371,728 448,525 427,551 367,779
  524288 498073 0.95 743,468 899,600 852,838 735,956
case size [bits] cardinality [bits] fill factor naive [x] better [x] block-3 [x] block-4 [x]
0x0000000000000000 4096 0 0.00 1.00 0.80 0.80 0.80
  16384 0 0.00 1.00 0.99 0.97 0.99
  65536 0 0.00 1.00 1.60 1.61 1.60
  262144 0 0.00 1.00 1.19 0.97 1.40
  524288 0 0.00 1.00 0.98 0.95 0.52
0x000000000000ffff 4096 1024 0.25 1.00 0.64 0.65 1.61
  16384 4096 0.25 1.00 0.60 0.63 1.62
  65536 16384 0.25 1.00 0.64 0.66 1.66
  262144 65536 0.25 1.00 0.66 0.66 1.63
  524288 131072 0.25 1.00 0.65 0.67 1.64
0x00000000ffffffff 4096 2048 0.50 1.00 0.45 0.64 1.03
  16384 8192 0.50 1.00 0.46 0.67 1.09
  65536 32768 0.50 1.00 0.45 0.65 1.03
  262144 131072 0.50 1.00 0.46 0.65 1.04
  524288 262144 0.50 1.00 0.45 0.65 1.01
0x0000ffffffffffff 4096 3072 0.75 1.00 0.62 1.53 1.34
  16384 12288 0.75 1.00 0.59 1.53 1.44
  65536 49152 0.75 1.00 0.62 1.52 1.54
  262144 196608 0.75 1.00 0.64 1.57 1.59
  524288 393216 0.75 1.00 0.62 1.52 1.54
0xffffffffffffffff 4096 4096 1.00 1.00 0.60 1.13 1.53
  16384 16384 1.00 1.00 0.62 1.16 1.61
  65536 65536 1.00 1.00 0.61 1.17 1.52
  262144 262144 1.00 1.00 0.61 1.17 1.53
  524288 524288 1.00 1.00 0.61 1.17 1.53
random 0.05 4096 204 0.05 1.00 6.08 1.19 1.25
  16384 819 0.05 1.00 5.88 1.15 1.17
  65536 3276 0.05 1.00 6.11 1.13 1.21
  262144 13107 0.05 1.00 5.96 1.13 1.20
  524288 26214 0.05 1.00 5.96 1.13 1.19
random 0.25 4096 1024 0.25 1.00 5.56 1.23 1.39
  16384 4096 0.25 1.00 5.41 1.21 1.48
  65536 16384 0.25 1.00 5.04 1.20 1.44
  262144 65536 0.25 1.00 5.30 1.23 1.48
  524288 131072 0.25 1.00 5.46 1.26 1.51
random 0.50 4096 2048 0.50 1.00 3.95 1.31 1.58
  16384 8192 0.50 1.00 3.95 1.34 1.59
  65536 32768 0.50 1.00 3.84 1.29 1.55
  262144 131072 0.50 1.00 4.10 1.41 1.69
  524288 262144 0.50 1.00 3.94 1.33 1.59
random 0.75 4096 3072 0.75 1.00 1.96 1.01 1.19
  16384 12288 0.75 1.00 1.95 1.01 1.20
  65536 49152 0.75 1.00 1.94 1.00 1.18
  262144 196608 0.75 1.00 1.95 1.00 1.18
  524288 393216 0.75 1.00 1.95 1.00 1.18
random 0.95 4096 3891 0.95 1.00 0.78 0.87 1.04
  16384 15564 0.95 1.00 0.81 0.87 1.04
  65536 62259 0.95 1.00 0.83 0.87 1.02
  262144 249036 0.95 1.00 0.83 0.87 1.01
  524288 498073 0.95 1.00 0.83 0.87 1.01
case size [bits] cardinality [bits] fill factor naive [us] better [us] block-3 [us] block-4 [us]
0x0000000000000000 4096 0 0.00 50 49 45 45
  16384 0 0.00 163 163 157 158
  65536 0 0.00 615 615 618 610
  262144 0 0.00 2,427 2,429 2,430 2,426
  524288 0 0.00 4,847 4,852 4,842 4,841
0x000000000000ffff 4096 1024 0.25 529 1,034 712 420
  16384 4096 0.25 2,095 4,117 2,824 1,666
  65536 16384 0.25 8,349 16,464 11,278 6,649
  262144 65536 0.25 33,137 65,842 45,105 26,586
  524288 131072 0.25 66,152 131,648 90,213 53,170
0x00000000ffffffff 4096 2048 0.50 1,454 2,707 1,232 799
  16384 8192 0.50 5,793 10,799 4,913 3,177
  65536 32768 0.50 23,120 43,145 19,642 12,691
  262144 131072 0.50 92,488 172,550 78,532 50,753
  524288 262144 0.50 185,023 344,964 157,053 101,495
0x0000ffffffffffff 4096 3072 0.75 2,033 3,839 2,044 1,176
  16384 12288 0.75 8,099 15,341 8,162 4,686
  65536 49152 0.75 32,355 61,320 32,639 18,730
  262144 196608 0.75 129,396 245,177 130,578 74,918
  524288 393216 0.75 258,800 490,388 261,125 149,813
0xffffffffffffffff 4096 4096 1.00 2,594 4,978 2,523 1,856
  16384 16384 1.00 10,350 19,895 10,078 7,405
  65536 65536 1.00 41,367 79,537 40,289 29,606
  262144 262144 1.00 165,384 318,046 161,135 118,412
  524288 524288 1.00 330,921 636,071 322,161 236,796
random 0.05 4096 204 0.05 3,539 198 1,946 1,782
  16384 819 0.05 14,531 802 10,125 10,592
  65536 3276 0.05 58,185 3,804 44,797 45,924
  262144 13107 0.05 236,529 36,876 191,849 197,751
  524288 26214 0.05 474,688 81,180 387,854 397,354
random 0.25 4096 1024 0.25 6,567 1,069 6,550 4,432
  16384 4096 0.25 40,152 5,065 32,714 27,052
  65536 16384 0.25 171,564 23,716 135,537 113,170
  262144 65536 0.25 687,572 103,065 554,231 468,943
  524288 131072 0.25 1,375,448 207,411 1,106,585 938,587
random 0.50 4096 2048 0.50 10,866 2,730 7,967 5,130
  16384 8192 0.50 54,893 10,936 39,080 30,518
  65536 32768 0.50 225,728 43,618 161,253 127,082
  262144 131072 0.50 901,437 174,320 657,454 526,084
  524288 262144 0.50 1,804,751 348,792 1,314,465 1,052,250
random 0.75 4096 3072 0.75 6,294 3,873 7,559 4,586
  16384 12288 0.75 41,584 15,479 35,960 28,457
  65536 49152 0.75 172,901 61,900 147,458 118,824
  262144 196608 0.75 692,473 247,568 603,821 496,718
  524288 393216 0.75 1,388,437 495,067 1,208,279 994,520
random 0.95 4096 3891 0.95 3,110 4,748 3,751 2,529
  16384 15564 0.95 15,982 18,973 16,981 12,763
  65536 62259 0.95 67,929 75,827 70,270 55,455
  262144 249036 0.95 273,796 303,289 281,986 233,859
  524288 498073 0.95 546,779 606,527 565,346 469,310
case size [bits] cardinality [bits] fill factor naive [x] better [x] block-3 [x] block-4 [x]
0x0000000000000000 4096 0 0.00 1.00 1.02 1.11 1.11
  16384 0 0.00 1.00 1.00 1.04 1.03
  65536 0 0.00 1.00 1.00 1.00 1.01
  262144 0 0.00 1.00 1.00 1.00 1.00
  524288 0 0.00 1.00 1.00 1.00 1.00
0x000000000000ffff 4096 1024 0.25 1.00 0.51 0.74 1.26
  16384 4096 0.25 1.00 0.51 0.74 1.26
  65536 16384 0.25 1.00 0.51 0.74 1.26
  262144 65536 0.25 1.00 0.50 0.73 1.25
  524288 131072 0.25 1.00 0.50 0.73 1.24
0x00000000ffffffff 4096 2048 0.50 1.00 0.54 1.18 1.82
  16384 8192 0.50 1.00 0.54 1.18 1.82
  65536 32768 0.50 1.00 0.54 1.18 1.82
  262144 131072 0.50 1.00 0.54 1.18 1.82
  524288 262144 0.50 1.00 0.54 1.18 1.82
0x0000ffffffffffff 4096 3072 0.75 1.00 0.53 0.99 1.73
  16384 12288 0.75 1.00 0.53 0.99 1.73
  65536 49152 0.75 1.00 0.53 0.99 1.73
  262144 196608 0.75 1.00 0.53 0.99 1.73
  524288 393216 0.75 1.00 0.53 0.99 1.73
0xffffffffffffffff 4096 4096 1.00 1.00 0.52 1.03 1.40
  16384 16384 1.00 1.00 0.52 1.03 1.40
  65536 65536 1.00 1.00 0.52 1.03 1.40
  262144 262144 1.00 1.00 0.52 1.03 1.40
  524288 524288 1.00 1.00 0.52 1.03 1.40
random 0.05 4096 204 0.05 1.00 17.87 1.82 1.99
  16384 819 0.05 1.00 18.12 1.44 1.37
  65536 3276 0.05 1.00 15.30 1.30 1.27
  262144 13107 0.05 1.00 6.41 1.23 1.20
  524288 26214 0.05 1.00 5.85 1.22 1.19
random 0.25 4096 1024 0.25 1.00 6.14 1.00 1.48
  16384 4096 0.25 1.00 7.93 1.23 1.48
  65536 16384 0.25 1.00 7.23 1.27 1.52
  262144 65536 0.25 1.00 6.67 1.24 1.47
  524288 131072 0.25 1.00 6.63 1.24 1.47
random 0.50 4096 2048 0.50 1.00 3.98 1.36 2.12
  16384 8192 0.50 1.00 5.02 1.40 1.80
  65536 32768 0.50 1.00 5.18 1.40 1.78
  262144 131072 0.50 1.00 5.17 1.37 1.71
  524288 262144 0.50 1.00 5.17 1.37 1.72
random 0.75 4096 3072 0.75 1.00 1.63 0.83 1.37
  16384 12288 0.75 1.00 2.69 1.16 1.46
  65536 49152 0.75 1.00 2.79 1.17 1.46
  262144 196608 0.75 1.00 2.80 1.15 1.39
  524288 393216 0.75 1.00 2.80 1.15 1.40
random 0.95 4096 3891 0.95 1.00 0.66 0.83 1.23
  16384 15564 0.95 1.00 0.84 0.94 1.25
  65536 62259 0.95 1.00 0.90 0.97 1.22
  262144 249036 0.95 1.00 0.90 0.97 1.17
  524288 498073 0.95 1.00 0.90 0.97 1.17