Skip to content

Commit

Permalink
Use thrust::identity as hash functions for byte pair encoding (#13665)
Browse files Browse the repository at this point in the history
This PR fixes a minor issue that distinct hash functions are used for `insert` and `find` in byte pair encoding. It also verifies that the latest changes in cuco won't break cudf.

Authors:
  - Yunsong Wang (https://github.com/PointKernel)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - https://github.com/nvdbaranec
  - Mark Harris (https://github.com/harrism)

URL: #13665
  • Loading branch information
PointKernel authored Jul 10, 2023
1 parent 998c2c0 commit 3c51c9e
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 2 deletions.
3 changes: 2 additions & 1 deletion cpp/src/text/subword/bpe_tokenizer.cu
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
#include <thrust/execution_policy.h>
#include <thrust/find.h>
#include <thrust/for_each.h>
#include <thrust/functional.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/transform_iterator.h>
#include <thrust/merge.h>
Expand Down Expand Up @@ -234,7 +235,7 @@ struct byte_pair_encoding_fn {
if (rhs.empty()) break; // no more adjacent pairs

auto const hash = compute_hash(lhs, rhs);
auto const map_itr = d_map.find(hash);
auto const map_itr = d_map.find(hash, thrust::identity<cudf::hash_value_type>{});
if (map_itr != d_map.end()) {
// found a match; record the rank (and other min_ vars)
auto const rank = static_cast<cudf::size_type>(map_itr->second);
Expand Down
2 changes: 1 addition & 1 deletion cpp/src/text/subword/load_merges_file.cu
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ std::unique_ptr<detail::merge_pairs_map_type> initialize_merge_pairs_map(

merge_pairs_map->insert(iter,
iter + input.size(),
cuco::murmurhash3_32<cudf::hash_value_type>{},
thrust::identity<cudf::hash_value_type>{},
thrust::equal_to<cudf::hash_value_type>{},
stream.value());

Expand Down

0 comments on commit 3c51c9e

Please sign in to comment.