Skip to content

Commit

Permalink
Fixes a performance regression in FST (#13850)
Browse files Browse the repository at this point in the history
#13344 introduced a performance regression to the FST benchmarks that showed as much as a 35% performance degradation. 

It seems that, after the refactor in the above PR, compiler optimization heuristics are deciding differently on loop unrolling in the part of the FST that's writing out transduced symbols. 

As a fix, we are enforcing to not unroll that loop.

Authors:
  - Elias Stehle (https://github.com/elstehle)

Approvers:
  - Karthikeyan (https://github.com/karthikeyann)
  - David Wendt (https://github.com/davidwendt)

URL: #13850
  • Loading branch information
elstehle authored Aug 11, 2023
1 parent 1050325 commit 6a407cf
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions cpp/src/io/fst/agent_dfa.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,9 @@ class DFASimulationCallbackWrapper {
{
uint32_t const count = transducer_table(old_state, symbol_id, read_symbol);
if (write) {
#if __CUDA_ARCH__ > 0
#pragma unroll 1
#endif
for (uint32_t out_char = 0; out_char < count; out_char++) {
out_it[out_count + out_char] =
transducer_table(old_state, symbol_id, out_char, read_symbol);
Expand Down

0 comments on commit 6a407cf

Please sign in to comment.