Skip to content

Commit

Permalink
Fix flaky HashJoinTest.failedToReclaimFromHashJoinBuildersInNonReclai (
Browse files Browse the repository at this point in the history
…#9962)

Summary:
The test is flaky because of the following race condition. When reclaim happens in join node, there is a chance probe side pipeline/driver has not been initialized, making reclaimer not set for the operator. This results in probe side not reporting non reclaimable attempts because the reclaim codepath is not executed. Added additional coordination of code execution to avoid such inconsistent behavior of the test.

Stress tests have been run to verify the consistency of the fixed test.

Pull Request resolved: #9962

Reviewed By: xiaoxmeng

Differential Revision: D57880512

Pulled By: tanjialiang

fbshipit-source-id: bcc829301911f151d51aa68e8b3c75a93ebf5293
  • Loading branch information
tanjialiang authored and facebook-github-bot committed May 28, 2024
1 parent 394ccb2 commit 44a8175
Showing 1 changed file with 20 additions and 2 deletions.
22 changes: 20 additions & 2 deletions velox/exec/tests/HashJoinTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6771,10 +6771,25 @@ DEBUG_ONLY_TEST_F(
runHashJoinTask(vectors, queryCtx, numDrivers, pool(), false).data;

std::atomic_bool nonReclaimableSectionWaitFlag{true};
std::atomic_bool reclaimerInitializationWaitFlag{true};
folly::EventCount nonReclaimableSectionWait;
std::atomic_bool memoryArbitrationWaitFlag{true};
folly::EventCount memoryArbitrationWait;

std::atomic<uint32_t> numInitializedDrivers{0};
SCOPED_TESTVALUE_SET(
"facebook::velox::exec::Driver::runInternal",
std::function<void(exec::Driver*)>([&](exec::Driver* driver) {
numInitializedDrivers++;
// We need to make sure reclaimers on both build and probe side are set
// (in Operator::initialize) to avoid race conditions, producing
// consistent test results.
if (numInitializedDrivers.load() == 2) {
reclaimerInitializationWaitFlag = false;
nonReclaimableSectionWait.notifyAll();
}
}));

std::atomic<bool> injectNonReclaimableSectionOnce{true};
SCOPED_TESTVALUE_SET(
"facebook::velox::common::memory::MemoryPoolImpl::allocateNonContiguous",
Expand Down Expand Up @@ -6809,8 +6824,11 @@ DEBUG_ONLY_TEST_F(
});

// Wait for the hash build operators to enter into non-reclaimable section.
nonReclaimableSectionWait.await(
[&]() { return !nonReclaimableSectionWaitFlag.load(); });
nonReclaimableSectionWait.await([&]() {
return (
!nonReclaimableSectionWaitFlag.load() &&
!reclaimerInitializationWaitFlag.load());
});

// We expect capacity grow fails as we can't reclaim from hash join operators.
memory::testingRunArbitration();
Expand Down

0 comments on commit 44a8175

Please sign in to comment.