Improve documentation for druid.indexer.autoscale.workerCapacityHint config #11444

maytasm · 2021-07-15T00:09:49Z

Improve documentation for druid.indexer.autoscale.workerCapacityHint config

Description

This is a followup PR to improve the doc of the change from Improve documentation for #11440

This PR has:

been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

jihoonson · 2021-07-19T17:11:55Z

docs/configuration/index.md

@@ -1015,7 +1015,7 @@ There are additional configs for autoscaling (if it is enabled):
 |`druid.indexer.autoscale.pendingTaskTimeout`|How long a task can be in "pending" state before the Overlord tries to scale up.|PT30S|
 |`druid.indexer.autoscale.workerVersion`|If set, will only create nodes of set version during autoscaling. Overrides dynamic configuration. |null|
 |`druid.indexer.autoscale.workerPort`|The port that MiddleManagers will run on.|8080|
-|`druid.indexer.autoscale.workerCapacityHint`| Worker capacity for determining the number of workers needed for auto scaling when there is currently no worker running. If unset or set to value of 0 or less, auto scaler will scale to `minNumWorkers` in autoScaler config instead. This value should typically be equal to `druid.worker.capacity` when you have a homogeneous cluster and the average of `druid.worker.capacity` across the workers when you have a heterogeneous cluster. Note: this config is only applicable to `pendingTaskBased` provisioning strategy|-1|
+|`druid.indexer.autoscale.workerCapacityHint`| Worker capacity for determining the number of workers needed for auto scaling when there is currently no worker running. If unset or set to value of 0 or less, auto scaler will scale to `minNumWorkers` in autoScaler config instead. This value should typically be equal to `druid.worker.capacity` when your workers have a homogeneous capacity and the average of `druid.worker.capacity` across the workers when your workers have a heterogeneous capacity. Note: this config is only applicable to `pendingTaskBased` provisioning strategy|-1|


Suggested change

|`druid.indexer.autoscale.workerCapacityHint`| Worker capacity for determining the number of workers needed for auto scaling when there is currently no worker running. If unset or set to value of 0 or less, auto scaler will scale to `minNumWorkers` in autoScaler config instead. This value should typically be equal to `druid.worker.capacity` when your workers have a homogeneous capacity and the average of `druid.worker.capacity` across the workers when your workers have a heterogeneous capacity. Note: this config is only applicable to `pendingTaskBased` provisioning strategy|-1|

|`druid.indexer.autoscale.workerCapacityHint`| Worker capacity for determining the number of new workers to spin up when there is currently no worker running. Each worker (middleManager or indexer) is assumed to have this amount of task slots. If it is unset or set to a negative, auto scaler will scale to `minNumWorkers` in autoScaler config instead. This value should typically be equal to `druid.worker.capacity` when your workers have a homogeneous capacity. When your workers have a heterogeneous capacity, it could be set to the average of `druid.worker.capacity` across the workers. Note: this config is only applicable to `pendingTaskBased` provisioning strategy|-1|

Suggested change

|`druid.indexer.autoscale.workerCapacityHint`| Worker capacity for determining the number of workers needed for auto scaling when there is currently no worker running. If unset or set to value of 0 or less, auto scaler will scale to `minNumWorkers` in autoScaler config instead. This value should typically be equal to `druid.worker.capacity` when your workers have a homogeneous capacity and the average of `druid.worker.capacity` across the workers when your workers have a heterogeneous capacity. Note: this config is only applicable to `pendingTaskBased` provisioning strategy|-1|

|`druid.indexer.autoscale.workerCapacityHint`| The number of new workers for the auto scaler to launch when there are no workers running. Assumes that each worker, either a middleManager or indexer, has the same amount of task slots. When unset or set to a negative, the auto scaler scales workers equal to the value for `minNumWorkers` in autoScaler config instead. Set the value for `autoscale.workerCapacityHint` equal to `druid.worker.capacity` when your workers have a homogeneous capacity. When your workers have a heterogeneous capacity, set the value to the average of `druid.worker.capacity` across the workers. Only applies to `pendingTaskBased` provisioning strategy|-1|

@maytasm , @jihoonson this phrase is confusing to me:
"Worker capacity for determining the number of new workers". If i set it to 5, does the autoscaler spin up 5 workers? If so it's just the number of new workers.

Also If I set it to 5, does each worker thn need to have 5 task slots too? I wasn't sure of that relationship: " Each worker (middleManager or indexer) is assumed to have this amount of task slots."

Also I think this could use an example somewhere.

@techdocsmith hmm, maybe this config needs more detailed description about how auto scaler works. When there are ingestion jobs pending, the auto scaler first computes how many new nodes are required to unblock those pending tasks. Since each worker (middleManager or indexer) can run more than one task at the same time (depending on druid.worker.capacity), the number of new nodes to spin up is roughly ceil(pending task count / worker capacity). The problem is, the auto scaler runs on the overlord and is not aware of druid.worker.capacity. Also, each worker can have a different value set to druid.worker.capacity in a heterogeneous cluster. As a result, the auto scaler is currently detecting the worker capacity from workers. However, this cannot work when there is no workers running. This PR works around this issue by adding a new config, workerCapacityHint, which can be used as a hint for auto scaler to compute the number of new workers to spin up even when there is no workers running. So, to answer your questions,

"Worker capacity for determining the number of new workers". If i set it to 5, does the autoscaler spin up 5 workers? If so it's just the number of new workers.

It depends on how many pending tasks you have. If you have 25 pending tasks, then yes it will spin up 5 new workers. If you have 8 pending tasks, the auto scaler will spin up 2 new workers because it is hinted that each worker has 5 task slots.

Also If I set it to 5, does each worker thn need to have 5 task slots too? I wasn't sure of that relationship: " Each worker (middleManager or indexer) is assumed to have this amount of task slots."

I think the relationship is opposite direction. When each worker has 5 task slots, then you need to set this config to 5 so that the auto scaler can correctly estimate the number of new workers needed.

Suggested change

|`druid.indexer.autoscale.workerCapacityHint`| Worker capacity for determining the number of workers needed for auto scaling when there is currently no worker running. If unset or set to value of 0 or less, auto scaler will scale to `minNumWorkers` in autoScaler config instead. This value should typically be equal to `druid.worker.capacity` when your workers have a homogeneous capacity and the average of `druid.worker.capacity` across the workers when your workers have a heterogeneous capacity. Note: this config is only applicable to `pendingTaskBased` provisioning strategy|-1|

|`druid.indexer.autoscale.workerCapacityHint`| A estimation of the number of task slots available for each worker launched by the auto scaler when there are no workers running. The auto scaler uses the worker capacity hint to launch workers with an adequate capacity to handle pending tasks. When unset or set to a negative, the auto scaler scales workers equal to the value for `minNumWorkers` in autoScaler config instead. The auto scaler assumes that each worker, either a middleManager or indexer, has the same amount of task slots. Therefore, when all your workers have the same capacity (homogeneous capacity), set the value for `autoscale.workerCapacityHint` equal to `druid.worker.capacity`. If your workers have different capacities (heterogeneous capacity), set the value to the average of `druid.worker.capacity` across the workers. For example, if two workers have `druid.worker.capacity=10`, and one has `druid.worker.capacity=4`, set `autoscale.workerCapacityHint=8`. Only applies to `pendingTaskBased` provisioning strategy.|-1|

Thanks @jihoonson . That made it a lot clearer. Maybe this description is closer?

This looks good 👍, but could it be clearer if it starts with An estimation of the number of task slots available...?

Thanks, @jihoonson , I made that edit. cc: @maytasm

It's actually not "When unset or set to a negative" but "When unset or set to a less than or equal to 0". The fall back on using minNumWorkers includes when this config is set to 0 since it doesn't ever make sense to have a worker with 0 capacity.

Made the change above with the addition of "less than or equal to 0" instead of "negative". Please approve if looks good @techdocsmith @jihoonson

docs/configuration/index.md

techdocsmith

Approved with small change.

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

jihoonson

LGTM. Thanks @maytasm.

fix doc

ee4a218

jihoonson reviewed Jul 19, 2021

View reviewed changes

address comments

f19d3df

techdocsmith reviewed Jul 20, 2021

View reviewed changes

docs/configuration/index.md Outdated Show resolved Hide resolved

techdocsmith approved these changes Jul 20, 2021

View reviewed changes

Update docs/configuration/index.md

c172d0e

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

jihoonson approved these changes Jul 20, 2021

View reviewed changes

jihoonson added the Area - Documentation label Jul 20, 2021

maytasm merged commit 6ce3b6c into apache:master Jul 21, 2021

maytasm deleted the IMPLY-8399-doc branch July 21, 2021 05:48

clintropolis added this to the 0.22.0 milestone Aug 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve documentation for druid.indexer.autoscale.workerCapacityHint config #11444

Improve documentation for druid.indexer.autoscale.workerCapacityHint config #11444

maytasm commented Jul 15, 2021 •

edited

Loading

jihoonson Jul 19, 2021

techdocsmith Jul 19, 2021

jihoonson Jul 19, 2021

techdocsmith Jul 20, 2021 •

edited

Loading

jihoonson Jul 20, 2021

techdocsmith Jul 20, 2021

maytasm Jul 20, 2021

maytasm Jul 20, 2021

techdocsmith left a comment

jihoonson left a comment

Improve documentation for druid.indexer.autoscale.workerCapacityHint config #11444

Improve documentation for druid.indexer.autoscale.workerCapacityHint config #11444

Conversation

maytasm commented Jul 15, 2021 • edited Loading

Description

jihoonson Jul 19, 2021

Choose a reason for hiding this comment

techdocsmith Jul 19, 2021

Choose a reason for hiding this comment

jihoonson Jul 19, 2021

Choose a reason for hiding this comment

techdocsmith Jul 20, 2021 • edited Loading

Choose a reason for hiding this comment

jihoonson Jul 20, 2021

Choose a reason for hiding this comment

techdocsmith Jul 20, 2021

Choose a reason for hiding this comment

maytasm Jul 20, 2021

Choose a reason for hiding this comment

maytasm Jul 20, 2021

Choose a reason for hiding this comment

techdocsmith left a comment

Choose a reason for hiding this comment

jihoonson left a comment

Choose a reason for hiding this comment

maytasm commented Jul 15, 2021 •

edited

Loading

techdocsmith Jul 20, 2021 •

edited

Loading