Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve rolling Supervisor restarts at taskDuration #15859

Merged
merged 1 commit into from
Feb 14, 2024

Conversation

YongGang
Copy link
Contributor

@YongGang YongGang commented Feb 7, 2024

Description

This PR builds on the foundation set by #14396 which introduced rolling supervisor restarts as a strategy to potentially reduce the number of task slots required for task handover in Druid. The primary goal of this PR is to refine the behavior and utilization of this feature by addressing two specific areas:

  1. Dynamic Adjustment of stopTaskCount: Previously, stopTaskCount defaulted to the value of taskCount if not explicitly specified in the Supervisor spec. This implicit behavior led to inconsistency when taskCount was manually increased, as stopTaskCount did not automatically adjust to reflect the new taskCount. This PR eliminates this implicit defaulting. Now, if stopTaskCount is not set, it remains null and is preserved as such, requiring explicit configuration to activate rolling restarts.
  2. Optimized Task Cycling: When stopTaskCount is configured to a low value (e.g., 1), it could lead to prolonged cycling and stopping of active tasks, especially for tasks that started earlier. To address this, the PR introduces logic to sort active running tasks by their start date, prioritizing the stopping of older tasks first. This approach aims to prevent early-started tasks from running excessively long, improving the overall efficiency of task cycling.

Key changed/added classes in this PR
  • SeekableStreamSupervisor updated to sort task groups by start time to prioritize early termination of earlier groups
  • SeekableStreamSupervisorIOConfig removed the automatic defaulting of stopTaskCount to taskCount, ensuring that stopTaskCount must be explicitly defined to influence the rolling restart behavior.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.


EasyMock.reset(spec);
EasyMock.expect(spec.isSuspended()).andReturn(false).anyTimes();
EasyMock.expect(spec.getDataSchema()).andReturn(getDataSchema()).anyTimes();

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
SeekableStreamSupervisorSpec.getDataSchema
should be avoided because it has been deprecated.
@YongGang YongGang changed the title Enhance rolling Supervisor restarts at taskDuration Improve rolling Supervisor restarts at taskDuration Feb 7, 2024
AtomicInteger stoppedTasks = new AtomicInteger();
// Sort task groups by start time to prioritize early termination of earlier groups, then iterate for processing
activelyReadingTaskGroups
.entrySet().stream().sorted(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Concurrency check: the stream change provide the same level of Concurrency guaranteed as the previous for loop.

@JsonProperty
public int getStopTaskCount()
public Integer getStopTaskCount()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Would be nice to add a javadoc here telling devs to use #getMaxAllowedStops instead.

Copy link
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @YongGang

@suneet-s suneet-s merged commit 19ed5c8 into apache:master Feb 14, 2024
83 checks passed
sreemanamala pushed a commit to sreemanamala/druid that referenced this pull request Feb 20, 2024
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants