Reduce $totalSchedulerThroughputPods in large clusters #1788

mborsz · 2021-04-27T06:50:28Z

In large clusters we create 2x more latency pods than #nodes. We don't think it's needed, so let's try to remove that.

I will run 5k test to see what happens in that scale.
Presubmit should validate 100 nodes scale.
/hold

mm4tt · 2021-04-27T07:12:19Z

/lgtm

k8s-ci-robot · 2021-04-27T07:12:38Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mborsz, mm4tt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~clusterloader2/OWNERS~~ [mborsz,mm4tt]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mm4tt · 2021-04-27T07:13:54Z

TBH, I'm more worried about the 100 node tests than the 5k tests. This change may increase the flakiness of the scheduling throughput measurement and to detect it a single presubmit run won't be enough. That said, I think we should just submit this and keep an eye on the periodic tests

mborsz · 2021-04-27T07:19:23Z

Matt, correct me if I'm wrong, but I don't think this affects 100 nodes scale at all:

before this change we had totalSchedulerThroughputPods = max(1000, 2 * nodes)
after this change we will have max(1000, nodes)

This means, that up to 500 nodes, 1000 value is used in both cases.

mm4tt · 2021-04-27T07:26:21Z

@mborsz, you're right. In that case I don't see any reason for not merging this. I'd be surprised if it affects 5k tests in any way. Creating 5k pods seems more than enough to accurately measure scheduling throughput.

mborsz · 2021-04-27T07:28:09Z

/hold cancel

Thanks!

Reduce $totalSchedulerThroughputPods in large clusters

43bf18c

k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 27, 2021

k8s-ci-robot requested review from mm4tt and wojtek-t April 27, 2021 06:50

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 27, 2021

k8s-ci-robot assigned mm4tt Apr 27, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 27, 2021

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 27, 2021

k8s-ci-robot merged commit 7dbc89a into kubernetes:master Apr 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce $totalSchedulerThroughputPods in large clusters #1788

Reduce $totalSchedulerThroughputPods in large clusters #1788

mborsz commented Apr 27, 2021

mm4tt commented Apr 27, 2021

k8s-ci-robot commented Apr 27, 2021

mm4tt commented Apr 27, 2021

mborsz commented Apr 27, 2021

mm4tt commented Apr 27, 2021

mborsz commented Apr 27, 2021

Reduce $totalSchedulerThroughputPods in large clusters #1788

Reduce $totalSchedulerThroughputPods in large clusters #1788

Conversation

mborsz commented Apr 27, 2021

mm4tt commented Apr 27, 2021

k8s-ci-robot commented Apr 27, 2021

mm4tt commented Apr 27, 2021

mborsz commented Apr 27, 2021

mm4tt commented Apr 27, 2021

mborsz commented Apr 27, 2021