-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve documentation for druid.indexer.autoscale.workerCapacityHint config #11444
Conversation
docs/configuration/index.md
Outdated
@@ -1015,7 +1015,7 @@ There are additional configs for autoscaling (if it is enabled): | |||
|`druid.indexer.autoscale.pendingTaskTimeout`|How long a task can be in "pending" state before the Overlord tries to scale up.|PT30S| | |||
|`druid.indexer.autoscale.workerVersion`|If set, will only create nodes of set version during autoscaling. Overrides dynamic configuration. |null| | |||
|`druid.indexer.autoscale.workerPort`|The port that MiddleManagers will run on.|8080| | |||
|`druid.indexer.autoscale.workerCapacityHint`| Worker capacity for determining the number of workers needed for auto scaling when there is currently no worker running. If unset or set to value of 0 or less, auto scaler will scale to `minNumWorkers` in autoScaler config instead. This value should typically be equal to `druid.worker.capacity` when you have a homogeneous cluster and the average of `druid.worker.capacity` across the workers when you have a heterogeneous cluster. Note: this config is only applicable to `pendingTaskBased` provisioning strategy|-1| | |||
|`druid.indexer.autoscale.workerCapacityHint`| Worker capacity for determining the number of workers needed for auto scaling when there is currently no worker running. If unset or set to value of 0 or less, auto scaler will scale to `minNumWorkers` in autoScaler config instead. This value should typically be equal to `druid.worker.capacity` when your workers have a homogeneous capacity and the average of `druid.worker.capacity` across the workers when your workers have a heterogeneous capacity. Note: this config is only applicable to `pendingTaskBased` provisioning strategy|-1| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|`druid.indexer.autoscale.workerCapacityHint`| Worker capacity for determining the number of workers needed for auto scaling when there is currently no worker running. If unset or set to value of 0 or less, auto scaler will scale to `minNumWorkers` in autoScaler config instead. This value should typically be equal to `druid.worker.capacity` when your workers have a homogeneous capacity and the average of `druid.worker.capacity` across the workers when your workers have a heterogeneous capacity. Note: this config is only applicable to `pendingTaskBased` provisioning strategy|-1| | |
|`druid.indexer.autoscale.workerCapacityHint`| Worker capacity for determining the number of new workers to spin up when there is currently no worker running. Each worker (middleManager or indexer) is assumed to have this amount of task slots. If it is unset or set to a negative, auto scaler will scale to `minNumWorkers` in autoScaler config instead. This value should typically be equal to `druid.worker.capacity` when your workers have a homogeneous capacity. When your workers have a heterogeneous capacity, it could be set to the average of `druid.worker.capacity` across the workers. Note: this config is only applicable to `pendingTaskBased` provisioning strategy|-1| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|`druid.indexer.autoscale.workerCapacityHint`| Worker capacity for determining the number of workers needed for auto scaling when there is currently no worker running. If unset or set to value of 0 or less, auto scaler will scale to `minNumWorkers` in autoScaler config instead. This value should typically be equal to `druid.worker.capacity` when your workers have a homogeneous capacity and the average of `druid.worker.capacity` across the workers when your workers have a heterogeneous capacity. Note: this config is only applicable to `pendingTaskBased` provisioning strategy|-1| | |
|`druid.indexer.autoscale.workerCapacityHint`| The number of new workers for the auto scaler to launch when there are no workers running. Assumes that each worker, either a middleManager or indexer, has the same amount of task slots. When unset or set to a negative, the auto scaler scales workers equal to the value for `minNumWorkers` in autoScaler config instead. Set the value for `autoscale.workerCapacityHint` equal to `druid.worker.capacity` when your workers have a homogeneous capacity. When your workers have a heterogeneous capacity, set the value to the average of `druid.worker.capacity` across the workers. Only applies to `pendingTaskBased` provisioning strategy|-1| |
@maytasm , @jihoonson this phrase is confusing to me:
"Worker capacity for determining the number of new workers". If i set it to 5, does the autoscaler spin up 5 workers? If so it's just the number of new workers.
Also If I set it to 5, does each worker thn need to have 5 task slots too? I wasn't sure of that relationship: " Each worker (middleManager or indexer) is assumed to have this amount of task slots."
Also I think this could use an example somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@techdocsmith hmm, maybe this config needs more detailed description about how auto scaler works. When there are ingestion jobs pending, the auto scaler first computes how many new nodes are required to unblock those pending tasks. Since each worker (middleManager or indexer) can run more than one task at the same time (depending on druid.worker.capacity
), the number of new nodes to spin up is roughly ceil(pending task count / worker capacity)
. The problem is, the auto scaler runs on the overlord and is not aware of druid.worker.capacity
. Also, each worker can have a different value set to druid.worker.capacity
in a heterogeneous cluster. As a result, the auto scaler is currently detecting the worker capacity from workers. However, this cannot work when there is no workers running. This PR works around this issue by adding a new config, workerCapacityHint
, which can be used as a hint for auto scaler to compute the number of new workers to spin up even when there is no workers running. So, to answer your questions,
"Worker capacity for determining the number of new workers". If i set it to 5, does the autoscaler spin up 5 workers? If so it's just the number of new workers.
It depends on how many pending tasks you have. If you have 25 pending tasks, then yes it will spin up 5 new workers. If you have 8 pending tasks, the auto scaler will spin up 2 new workers because it is hinted that each worker has 5 task slots.
Also If I set it to 5, does each worker thn need to have 5 task slots too? I wasn't sure of that relationship: " Each worker (middleManager or indexer) is assumed to have this amount of task slots."
I think the relationship is opposite direction. When each worker has 5 task slots, then you need to set this config to 5 so that the auto scaler can correctly estimate the number of new workers needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|`druid.indexer.autoscale.workerCapacityHint`| Worker capacity for determining the number of workers needed for auto scaling when there is currently no worker running. If unset or set to value of 0 or less, auto scaler will scale to `minNumWorkers` in autoScaler config instead. This value should typically be equal to `druid.worker.capacity` when your workers have a homogeneous capacity and the average of `druid.worker.capacity` across the workers when your workers have a heterogeneous capacity. Note: this config is only applicable to `pendingTaskBased` provisioning strategy|-1| | |
|`druid.indexer.autoscale.workerCapacityHint`| A estimation of the number of task slots available for each worker launched by the auto scaler when there are no workers running. The auto scaler uses the worker capacity hint to launch workers with an adequate capacity to handle pending tasks. When unset or set to a negative, the auto scaler scales workers equal to the value for `minNumWorkers` in autoScaler config instead. The auto scaler assumes that each worker, either a middleManager or indexer, has the same amount of task slots. Therefore, when all your workers have the same capacity (homogeneous capacity), set the value for `autoscale.workerCapacityHint` equal to `druid.worker.capacity`. If your workers have different capacities (heterogeneous capacity), set the value to the average of `druid.worker.capacity` across the workers. For example, if two workers have `druid.worker.capacity=10`, and one has `druid.worker.capacity=4`, set `autoscale.workerCapacityHint=8`. Only applies to `pendingTaskBased` provisioning strategy.|-1| |
Thanks @jihoonson . That made it a lot clearer. Maybe this description is closer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good 👍, but could it be clearer if it starts with An estimation of the number of task slots available...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @jihoonson , I made that edit. cc: @maytasm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually not "When unset or set to a negative" but "When unset or set to a less than or equal to 0". The fall back on using minNumWorkers includes when this config is set to 0 since it doesn't ever make sense to have a worker with 0 capacity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the change above with the addition of "less than or equal to 0" instead of "negative". Please approve if looks good @techdocsmith @jihoonson
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with small change.
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @maytasm.
Improve documentation for druid.indexer.autoscale.workerCapacityHint config
Description
This is a followup PR to improve the doc of the change from Improve documentation for #11440
This PR has: