Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor hashtable use linear probing #15

Merged
merged 89 commits into from
Jun 8, 2020

Conversation

danielealbano
Copy link
Owner

This PR contains the required refactoring to switch to a 13 slots per bucket model where each slot is identified by an half of the hash and uses accelerated AVX2/AVX, if available, to use 2 SIMD instructions to search among the slots.

Each bucket is cache-line aligned hence the number of slots is limited, although from the tests and benchmarks even with an hashtable with 133821673 buckets the number of slots used, after having inserted 120439505 keys (load factor 0.9) is 9 therefore the upper limit of 13 can be considered reasonably high.

This refactoring includes a couple of feature flags to:

  • embed the key value per slot onto the bucket instead of having it external, it improves the performances by an average of 15% but a lot of memory is wasted as consequence of the pre-allocation
  • disable the write lock for the bucket, uses atomic operations to update the half hash in the bucket and re-organise the instruction to ensure to work properly without locks but it's 6 times slower and requires the code to be reviewed.

This code is based of another branch containing the required changes to use chaining but after an initial implementation and benchmark I have decided to switch back to linear probing but with a different slightly implementation to improve the overall performances.

From an initial bench on an AMD EPYC 7502P with 128GB of memory (the additional stats need to be reviewed, they aren't calculated properly)

hashtable_op_set_new/11748391/2937097/2/iterations:1/real_time/threads:128               32672169 ns    199629794 ns          128 keys_to_insert=2.9371M load_factor=0.25 load_factor_buckets=0.153504 total_buckets=11.7484M used_avg_bucket_slots=1.08569 used_buckets=2.70513M used_max_bucket_slots=5
hashtable_op_set_new/11748391/2937097/2/iterations:1/real_time/threads:256               21074638 ns     88349183 ns          256 keys_to_insert=2.9371M load_factor=0.25 load_factor_buckets=0.15359 total_buckets=11.7484M used_avg_bucket_slots=1.08509 used_buckets=2.70665M used_max_bucket_slots=6
hashtable_op_set_new/11748391/3876969/2/iterations:1/real_time/threads:128                4449545 ns    142267550 ns          128 keys_to_insert=3.87697M load_factor=0.33 load_factor_buckets=0.197453 total_buckets=11.7484M used_avg_bucket_slots=1.11412 used_buckets=3.47963M used_max_bucket_slots=6
hashtable_op_set_new/11748391/3876969/2/iterations:1/real_time/threads:256                1403225 ns     68445306 ns          256 keys_to_insert=3.87697M load_factor=0.33 load_factor_buckets=0.197459 total_buckets=11.7484M used_avg_bucket_slots=1.11409 used_buckets=3.47973M used_max_bucket_slots=7
hashtable_op_set_new/11748391/5874195/2/iterations:1/real_time/threads:128                5670142 ns    159535779 ns          128 keys_to_insert=5.87419M load_factor=0.5 load_factor_buckets=0.28346 total_buckets=11.7484M used_avg_bucket_slots=1.17583 used_buckets=4.9953M used_max_bucket_slots=7
hashtable_op_set_new/11748391/5874195/2/iterations:1/real_time/threads:256                1833017 ns     84183284 ns          256 keys_to_insert=5.87419M load_factor=0.5 load_factor_buckets=0.283449 total_buckets=11.7484M used_avg_bucket_slots=1.17588 used_buckets=4.9951M used_max_bucket_slots=7
hashtable_op_set_new/11748391/8811293/2/iterations:1/real_time/threads:128                8308756 ns    204654679 ns          128 keys_to_insert=8.81129M load_factor=0.75 load_factor_buckets=0.393357 total_buckets=11.7484M used_avg_bucket_slots=1.27094 used_buckets=6.93196M used_max_bucket_slots=8
hashtable_op_set_new/11748391/8811293/2/iterations:1/real_time/threads:256                2271562 ns     90279197 ns          256 keys_to_insert=8.81129M load_factor=0.75 load_factor_buckets=0.393446 total_buckets=11.7484M used_avg_bucket_slots=1.27065 used_buckets=6.93352M used_max_bucket_slots=8
hashtable_op_set_new/11748391/10573551/2/iterations:1/real_time/threads:128               7298341 ns    191494659 ns          128 keys_to_insert=10.5736M load_factor=0.9 load_factor_buckets=0.451147 total_buckets=11.7484M used_avg_bucket_slots=1.32973 used_buckets=7.95035M used_max_bucket_slots=8
hashtable_op_set_new/11748391/10573551/2/iterations:1/real_time/threads:256               1997645 ns     93085817 ns          256 keys_to_insert=10.5736M load_factor=0.9 load_factor_buckets=0.451078 total_buckets=11.7484M used_avg_bucket_slots=1.32993 used_buckets=7.94915M used_max_bucket_slots=8
hashtable_op_set_new/133821673/33455418/2/iterations:1/real_time/threads:128            128461522 ns   2099067226 ns          128 keys_to_insert=33.4554M load_factor=0.25 load_factor_buckets=0.153445 total_buckets=133.822M used_avg_bucket_slots=1.0856 used_buckets=30.8014M used_max_bucket_slots=6
hashtable_op_set_new/133821673/33455418/2/iterations:1/real_time/threads:256             23609258 ns    716225210 ns          256 keys_to_insert=33.4554M load_factor=0.25 load_factor_buckets=0.153448 total_buckets=133.822M used_avg_bucket_slots=1.08558 used_buckets=30.802M used_max_bucket_slots=6
hashtable_op_set_new/133821673/44161152/2/iterations:1/real_time/threads:128             58457250 ns   2057175148 ns          128 keys_to_insert=44.1612M load_factor=0.33 load_factor_buckets=0.197363 total_buckets=133.822M used_avg_bucket_slots=1.11393 used_buckets=39.6173M used_max_bucket_slots=6
hashtable_op_set_new/133821673/44161152/2/iterations:1/real_time/threads:256             17624390 ns    767131974 ns          256 keys_to_insert=44.1612M load_factor=0.33 load_factor_buckets=0.197361 total_buckets=133.822M used_avg_bucket_slots=1.11394 used_buckets=39.6168M used_max_bucket_slots=7
hashtable_op_set_new/133821673/33455418/2/iterations:1/real_time/threads:256              9742951 ns    627039687 ns          256 keys_to_insert=33.4554M load_factor=0.25 load_factor_buckets=1 total_buckets=133.822M used_avg_bucket_slots=0.166579 used_buckets=200.733M used_max_bucket_slots=6
hashtable_op_set_new/133821673/66910836/2/iterations:1/real_time/threads:128            103718502 ns   2900329998 ns          128 keys_to_insert=66.9108M load_factor=0.5 load_factor_buckets=0.283221 total_buckets=133.822M used_avg_bucket_slots=1.17572 used_buckets=56.8516M used_max_bucket_slots=7
hashtable_op_set_new/133821673/66910836/2/iterations:1/real_time/threads:256             25181532 ns    959190065 ns          256 keys_to_insert=66.9108M load_factor=0.5 load_factor_buckets=0.283225 total_buckets=133.822M used_avg_bucket_slots=1.17572 used_buckets=56.8525M used_max_bucket_slots=7

… 14 entries with AVX512, AVX2, SSE4 and a simple linear search support
…vx512,avx2,sse4,loop} functions, not fully tested
…used during the search phase to speed up the lookup
…instruction set variable to hint which instruction set should be selected
…e real world they don't get to perform exactly X searches on exactly the same data exactly sequentially, this approach regenerates the data each time and is more realistic
…ir own files, need to properly handle AVX2 compilation flags targetted per src file
…ke_config.*, rename version.cmake module in cmake_config.cmake, improve message logging, add a custom target to automatically update the cmake_config.c file on every build to correctly update the build date/time
… variables to include the re-generated file at build time
…tialized, add a static variable to ensure the code is executed only once
…he compiler will always able to emit avx2/avx instructions
… instead of function pointers to pick the best implementation option at runtime
…e convention for the methods and move the shared code to an external support c file
…collect hashtable stats and update state code
…y/values onto the bucket and add support disabling the locks (switch to atomic operations)
@danielealbano danielealbano merged commit fa13abd into master Jun 8, 2020
@danielealbano danielealbano deleted the refactor_hashtable_use_linear_probing branch June 8, 2020 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant