Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve documentation for druid.indexer.autoscale.workerCapacityHint config #11444

Merged
merged 3 commits into from
Jul 21, 2021

Conversation

maytasm
Copy link
Contributor

@maytasm maytasm commented Jul 15, 2021

Improve documentation for druid.indexer.autoscale.workerCapacityHint config

Description

This is a followup PR to improve the doc of the change from Improve documentation for #11440

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@@ -1015,7 +1015,7 @@ There are additional configs for autoscaling (if it is enabled):
|`druid.indexer.autoscale.pendingTaskTimeout`|How long a task can be in "pending" state before the Overlord tries to scale up.|PT30S|
|`druid.indexer.autoscale.workerVersion`|If set, will only create nodes of set version during autoscaling. Overrides dynamic configuration. |null|
|`druid.indexer.autoscale.workerPort`|The port that MiddleManagers will run on.|8080|
|`druid.indexer.autoscale.workerCapacityHint`| Worker capacity for determining the number of workers needed for auto scaling when there is currently no worker running. If unset or set to value of 0 or less, auto scaler will scale to `minNumWorkers` in autoScaler config instead. This value should typically be equal to `druid.worker.capacity` when you have a homogeneous cluster and the average of `druid.worker.capacity` across the workers when you have a heterogeneous cluster. Note: this config is only applicable to `pendingTaskBased` provisioning strategy|-1|
|`druid.indexer.autoscale.workerCapacityHint`| Worker capacity for determining the number of workers needed for auto scaling when there is currently no worker running. If unset or set to value of 0 or less, auto scaler will scale to `minNumWorkers` in autoScaler config instead. This value should typically be equal to `druid.worker.capacity` when your workers have a homogeneous capacity and the average of `druid.worker.capacity` across the workers when your workers have a heterogeneous capacity. Note: this config is only applicable to `pendingTaskBased` provisioning strategy|-1|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|`druid.indexer.autoscale.workerCapacityHint`| Worker capacity for determining the number of workers needed for auto scaling when there is currently no worker running. If unset or set to value of 0 or less, auto scaler will scale to `minNumWorkers` in autoScaler config instead. This value should typically be equal to `druid.worker.capacity` when your workers have a homogeneous capacity and the average of `druid.worker.capacity` across the workers when your workers have a heterogeneous capacity. Note: this config is only applicable to `pendingTaskBased` provisioning strategy|-1|
|`druid.indexer.autoscale.workerCapacityHint`| Worker capacity for determining the number of new workers to spin up when there is currently no worker running. Each worker (middleManager or indexer) is assumed to have this amount of task slots. If it is unset or set to a negative, auto scaler will scale to `minNumWorkers` in autoScaler config instead. This value should typically be equal to `druid.worker.capacity` when your workers have a homogeneous capacity. When your workers have a heterogeneous capacity, it could be set to the average of `druid.worker.capacity` across the workers. Note: this config is only applicable to `pendingTaskBased` provisioning strategy|-1|

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|`druid.indexer.autoscale.workerCapacityHint`| Worker capacity for determining the number of workers needed for auto scaling when there is currently no worker running. If unset or set to value of 0 or less, auto scaler will scale to `minNumWorkers` in autoScaler config instead. This value should typically be equal to `druid.worker.capacity` when your workers have a homogeneous capacity and the average of `druid.worker.capacity` across the workers when your workers have a heterogeneous capacity. Note: this config is only applicable to `pendingTaskBased` provisioning strategy|-1|
|`druid.indexer.autoscale.workerCapacityHint`| The number of new workers for the auto scaler to launch when there are no workers running. Assumes that each worker, either a middleManager or indexer, has the same amount of task slots. When unset or set to a negative, the auto scaler scales workers equal to the value for `minNumWorkers` in autoScaler config instead. Set the value for `autoscale.workerCapacityHint` equal to `druid.worker.capacity` when your workers have a homogeneous capacity. When your workers have a heterogeneous capacity, set the value to the average of `druid.worker.capacity` across the workers. Only applies to `pendingTaskBased` provisioning strategy|-1|

@maytasm , @jihoonson this phrase is confusing to me:
"Worker capacity for determining the number of new workers". If i set it to 5, does the autoscaler spin up 5 workers? If so it's just the number of new workers.

Also If I set it to 5, does each worker thn need to have 5 task slots too? I wasn't sure of that relationship: " Each worker (middleManager or indexer) is assumed to have this amount of task slots."

Also I think this could use an example somewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@techdocsmith hmm, maybe this config needs more detailed description about how auto scaler works. When there are ingestion jobs pending, the auto scaler first computes how many new nodes are required to unblock those pending tasks. Since each worker (middleManager or indexer) can run more than one task at the same time (depending on druid.worker.capacity), the number of new nodes to spin up is roughly ceil(pending task count / worker capacity). The problem is, the auto scaler runs on the overlord and is not aware of druid.worker.capacity. Also, each worker can have a different value set to druid.worker.capacity in a heterogeneous cluster. As a result, the auto scaler is currently detecting the worker capacity from workers. However, this cannot work when there is no workers running. This PR works around this issue by adding a new config, workerCapacityHint, which can be used as a hint for auto scaler to compute the number of new workers to spin up even when there is no workers running. So, to answer your questions,

"Worker capacity for determining the number of new workers". If i set it to 5, does the autoscaler spin up 5 workers? If so it's just the number of new workers.

It depends on how many pending tasks you have. If you have 25 pending tasks, then yes it will spin up 5 new workers. If you have 8 pending tasks, the auto scaler will spin up 2 new workers because it is hinted that each worker has 5 task slots.

Also If I set it to 5, does each worker thn need to have 5 task slots too? I wasn't sure of that relationship: " Each worker (middleManager or indexer) is assumed to have this amount of task slots."

I think the relationship is opposite direction. When each worker has 5 task slots, then you need to set this config to 5 so that the auto scaler can correctly estimate the number of new workers needed.

Copy link
Contributor

@techdocsmith techdocsmith Jul 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|`druid.indexer.autoscale.workerCapacityHint`| Worker capacity for determining the number of workers needed for auto scaling when there is currently no worker running. If unset or set to value of 0 or less, auto scaler will scale to `minNumWorkers` in autoScaler config instead. This value should typically be equal to `druid.worker.capacity` when your workers have a homogeneous capacity and the average of `druid.worker.capacity` across the workers when your workers have a heterogeneous capacity. Note: this config is only applicable to `pendingTaskBased` provisioning strategy|-1|
|`druid.indexer.autoscale.workerCapacityHint`| A estimation of the number of task slots available for each worker launched by the auto scaler when there are no workers running. The auto scaler uses the worker capacity hint to launch workers with an adequate capacity to handle pending tasks. When unset or set to a negative, the auto scaler scales workers equal to the value for `minNumWorkers` in autoScaler config instead. The auto scaler assumes that each worker, either a middleManager or indexer, has the same amount of task slots. Therefore, when all your workers have the same capacity (homogeneous capacity), set the value for `autoscale.workerCapacityHint` equal to `druid.worker.capacity`. If your workers have different capacities (heterogeneous capacity), set the value to the average of `druid.worker.capacity` across the workers. For example, if two workers have `druid.worker.capacity=10`, and one has `druid.worker.capacity=4`, set `autoscale.workerCapacityHint=8`. Only applies to `pendingTaskBased` provisioning strategy.|-1|

Thanks @jihoonson . That made it a lot clearer. Maybe this description is closer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good 👍, but could it be clearer if it starts with An estimation of the number of task slots available...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @jihoonson , I made that edit. cc: @maytasm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually not "When unset or set to a negative" but "When unset or set to a less than or equal to 0". The fall back on using minNumWorkers includes when this config is set to 0 since it doesn't ever make sense to have a worker with 0 capacity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the change above with the addition of "less than or equal to 0" instead of "negative". Please approve if looks good @techdocsmith @jihoonson

Copy link
Contributor

@techdocsmith techdocsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with small change.

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @maytasm.

@maytasm maytasm merged commit 6ce3b6c into apache:master Jul 21, 2021
@maytasm maytasm deleted the IMPLY-8399-doc branch July 21, 2021 05:48
@clintropolis clintropolis added this to the 0.22.0 milestone Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants