Flaky Test: WorkerTaskManagerTest
and stuck CompactionTaskRunTest
#14406
Labels
WorkerTaskManagerTest
and stuck CompactionTaskRunTest
#14406
Link to failed GHA runner: https://github.com/apache/druid/actions/runs/5196957717/jobs/9379949692
At first glance, the "indexing module test" seems greyed out. It seems like the "Operation was canceled" after running for close to 6 hours -- I think the timeout is just a symptom (see more details below).
It's useful to look at the raw logs as it includes timestamps to understand the timeline of events. From the logs, the first indexing unit test that failed
WorkerTaskManagerTest
:Shortly following the above failure, we can observe a series of indexing unit tests fail -
RangePartitionAdjustingCorePartitionSizeTest
,MultiPhaseParallelIndexingRowStatsTest
, etc.Finally, the indexing test module gets stuck in
CompactionTaskRunTest
-- for almost 5h30m before what seems like the runner comes along and kills the job around the 6h mark:Potential causes
WorkerTaskManagerTest
, it seems like completed task directory creation failed -- this makes me wonder if the GHA runner somehow was out of storage space.It might be worthwhile looking into why
CompactionTaskRunTest
was stuck and didn't seem to make any progress for 5h 30m. I wonder if it was blocked on some IO related to 1. The test framework should've timed out for such long-running tests.The text was updated successfully, but these errors were encountered: