Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Watcher ExecuteWatchQueuedStatsTests #82134

Merged
merged 1 commit into from
Jan 5, 2022

Conversation

jakelandis
Copy link
Contributor

Attempt to stabilize ExecuteWatchQueuedStatsTests.
This test is tricky since it non-reproducible old style integration
test that runs 3 nodes with Watcher and digs in pretty deep into
the implementation details. It is believed that there is a race condition
between the testing concurrency and Watcher concurrency such
that Watcher will schedule a new Watch at just the wrong time such that
a future is created but Watcher gets shut down before it complete
which results in the test waiting indefinitely for a future that can never return.

The main fix here is to bump the interval up from 1s to 1h such that Watcher
will not schedule second watch that will never finish. There secondary fixes
are to place an upper time bound waiting on the future and
not record the watch execution. Recording the watch execution results in
error but is inconsequential to this test.

Fixes #66392

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@jakelandis jakelandis added the >test Issues or PRs that are addressing/adding tests label Dec 29, 2021
@martijnvg
Copy link
Member

I'm not against unmuting this test, but I do wonder whether we can replace it with a unit test instead.

This test tests a particular error that existed before. I wonder whether we can make TransportExecuteWatchAction
unit testable, so that we can test the scenario that the ExecuteWatchQueedStatsTests is testing. Maybe we can raise the visibility of executeWatch() to package protected and create a test instance of ExecutionService (seems like we do this in ExecutionServiceTests), TriggerService (seems like we create test instances of this class in several places), Clock and ThreadPool (using TestThreadPool). The other dependencies can be ignored.

@jakelandis
Copy link
Contributor Author

but I do wonder whether we can replace it with a unit test instead.

That would probably be best .... However, I think we should try this fix and if it is still flaky then look to replace the test.

@martijnvg
Copy link
Member

martijnvg commented Jan 5, 2022 via email

@jakelandis jakelandis merged commit 0801b4a into elastic:master Jan 5, 2022
@jakelandis jakelandis deleted the attempt_to_fix_66392 branch January 5, 2022 16:16
astefan pushed a commit to astefan/elasticsearch that referenced this pull request Jan 7, 2022
astefan pushed a commit to astefan/elasticsearch that referenced this pull request Jan 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Watcher Team:Data Management Meta label for data/management team >test Issues or PRs that are addressing/adding tests v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] ExecuteWatchQueuedStatsTests test timing out
4 participants