Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ForkingTaskRunnerTest #16323

Merged
merged 2 commits into from
Apr 24, 2024
Merged

Fix ForkingTaskRunnerTest #16323

merged 2 commits into from
Apr 24, 2024

Conversation

kfaraz
Copy link
Contributor

@kfaraz kfaraz commented Apr 23, 2024

ForkingTaskRunner uses static fields to track task counts which are reported in metrics.
This causes some tests like ForkingTaskRunnerTest.testInvalidTaskContextJavaOptsArray() to give false positives if the variables had been set by a preceding test.

Changes:

  • Use non-static fields to track task counts in ForkingTaskRunner
  • Update assertions in tests to ensure that the tests are idempotent.

@IgorBerman
Copy link
Contributor

@kfaraz don't you think there should be
failedTaskCount .incrementAndGet();
near LOGGER.info(t, "Exception caught during execution"); at wrapping try-catch?
I mean when we pass bad arguments the IllegalArgumentException is thrown but the code never increments failedTaskCount

@IgorBerman
Copy link
Contributor

btw, those counters used in metrics reporting, so removing static will cause change in those metrics... i mean from test perspective static causing races, but if we looking at metric level and WorkerTaskCountStatsProvider I can understand why it's static since it's "Proides task / task count status at the level of individual worker nodes"

@IgorBerman
Copy link
Contributor

if testing framework makes sure that each test runs one after another and not in parallel, other option would be to reset static variables in some visible for test method.
Additional option would be to use some interface that a) decouples from ForkingTaskRunnerTest & WorkerTaskCountStatsProvider and b) "collects" reports from ForkingTaskRunnerTest. for tests it will 1 implementation, race free.
for production it can wrap those static variables and WorkerTaskCountStatsProvider will take those reports from this interface

@kfaraz
Copy link
Contributor Author

kfaraz commented Apr 24, 2024

but if we looking at metric level and WorkerTaskCountStatsProvider I can understand why it's static since it's "Proides task / task count status at the level of individual worker nodes"

@IgorBerman , I am not sure I follow. How would this be affected if we make the fields non-static?
ForkingTaskRunner is used only by MiddleManagers and each MiddleManager would have only a single instance of it. Keeping the fields static doesn't serve any purpose, afaict.

@kfaraz
Copy link
Contributor Author

kfaraz commented Apr 24, 2024

don't you think there should be
failedTaskCount .incrementAndGet();
near LOGGER.info(t, "Exception caught during execution"); at wrapping try-catch?

Yes, I wanted to do this but I decided to do it later as I need to look at all the exceptions and if it would be better to not throw an exception at all and simply return a TaskStatus.failure() in those cases. But I need to test it properly before making those changes.

The changes in this PR are more straightforward and just meant to get the build right without altering any behaviour.

@IgorBerman
Copy link
Contributor

oh, ok, if it's singleton then it will work and no need in static. I missed this part

@kfaraz kfaraz changed the title Fix fork runner test Fix ForkingTaskRunnerTest Apr 24, 2024
@@ -494,7 +493,7 @@ int waitForTaskProcessToComplete(Task task, ProcessHolder processHolder, File lo
+ task.getId()
+ " must be an array of strings.")
);
Assert.assertEquals(1L, (long) forkingTaskRunner.getWorkerFailedTaskCount());
Assert.assertEquals(0L, (long) forkingTaskRunner.getWorkerFailedTaskCount());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are both failed and successful counts 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ideally the failed count should be 1, but it is currently not being tracked.
We can fix up the code later as mentioned here: #16323 (comment).

Copy link
Contributor

@AmatyaAvadhanula AmatyaAvadhanula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification, @kfaraz. LGTM

@kfaraz kfaraz merged commit 1dabb02 into apache:master Apr 24, 2024
85 checks passed
@kfaraz kfaraz deleted the fix_fork_runner_test branch April 24, 2024 08:35
@kfaraz
Copy link
Contributor Author

kfaraz commented Apr 24, 2024

Thanks for the review, @AmatyaAvadhanula , @IgorBerman !

@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants