Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clusterloader2/testing/load: Adjust kube-state-metrics latency tests #1774

Merged
merged 2 commits into from
Mar 25, 2021

Conversation

lilic
Copy link
Member

@lilic lilic commented Mar 23, 2021

Enable kube-state-metrics to be run in the presubmit load tests.
/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 23, 2021
@lilic
Copy link
Member Author

lilic commented Mar 23, 2021

FYI @mrueg @brancz @tariq1890 this just enables it temporarily so we see the results in the 100 node presubmit tests. 🎉

@lilic
Copy link
Member Author

lilic commented Mar 23, 2021

@wojtek-t, this is the PR as discussed earlier. Thanks!

@lilic
Copy link
Member Author

lilic commented Mar 23, 2021

Update: Seems like its failing to find the service object for kube-state-metrics, looking into it.

@mrueg
Copy link
Member

mrueg commented Mar 23, 2021

@lilic F0323 14:49:28.989150 14091 clusterloader.go:291] Error while setting up prometheus stack: parsing error: template: :34: unexpected {{end}} could it be related to 5a6de85#diff-95ffc56dd3788da48793718c29a759d29cd9a8fa1242e9ea8bcb6dd672f0b6e4R34 ? There are two {{end}} statements that might cause it

@lilic
Copy link
Member Author

lilic commented Mar 23, 2021

@mrueg I took that from

requests:
cpu: {{AddInt 200 (MultiplyInt 500 (DivideInt .Nodes 1000))}}m
{{if $PROMETHEUS_SCRAPE_KUBELETS}}
memory: 10Gi
{{else}}
# Start with 2Gi and add 2Gi for each 1K nodes.
memory: {{MultiplyInt 2 (AddInt 1 (DivideInt .Nodes 1000))}}Gi
{{end}}
limits:
{{if $PROMETHEUS_SCRAPE_KUBELETS}}
memory: 10Gi
but you might be right, there is no need for end as we are not using the if. Good catch!

@mrueg
Copy link
Member

mrueg commented Mar 24, 2021

Is it valid in k8s to have a non-numerical targetPort? Is it possible, that could be the reason it can't find the service? 5a6de85#diff-38d60ed161d961ff44471bdbfbdfef4c3a3c78d2baaec3bc4ba21cfc9c2cc5a2R17

@lilic
Copy link
Member Author

lilic commented Mar 24, 2021

@mrueg it worked correctly for

pull-perf-tests-clusterloader2 — Job succeeded.

We can see the results of the latency test https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/perf-tests/1774/pull-perf-tests-clusterloader2/1374398775702851584/artifacts/KubeStateMetricsLatency_load_2021-03-23T16:58:30Z.json

About the non-numerical ports -> https://kubernetes.io/docs/concepts/services-networking/service/#multi-port-services

For example, the names 123-abc and web are valid, but 123_abc and -web are not.

@mrueg
Copy link
Member

mrueg commented Mar 24, 2021

@mrueg it worked correctly for

pull-perf-tests-clusterloader2 — Job succeeded.

We can see the results of the latency test https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/perf-tests/1774/pull-perf-tests-clusterloader2/1374398775702851584/artifacts/KubeStateMetricsLatency_load_2021-03-23T16:58:30Z.json

About the non-numerical ports -> https://kubernetes.io/docs/concepts/services-networking/service/#multi-port-services

For example, the names 123-abc and web are valid, but 123_abc and -web are not.

Thanks! I didn't know this was possible 👍

@lilic
Copy link
Member Author

lilic commented Mar 24, 2021

Update on this, KSM uses the "in-cluster" config, which is incompatible with kubemark this is why they are failing. Thanks @wojtek-t ! I will remove KSM from running under the kubemark job.

Added code to disable this KSM test when running in kubemark env. If all is green will take out the enabled by default parts and we can proceed with the other changes.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 24, 2021
@lilic
Copy link
Member Author

lilic commented Mar 24, 2021

kubmark passed as it was skipped and the other load test passed and these are the results for kube-state-metrics latency loads results for the 100 node cluster:

First run:

{
"Perc50": 243181818,
"Perc90": 456730769,
"Perc99": 725000000
}

Second run:

{
"Perc50": 259615384,
"Perc90": 475000000,
"Perc99": 906666666
}

Will remove the other commits that enable the runs by default and remove the WIP.

@lilic lilic changed the title WIP: clusterloader2/testing/load/config.yaml: Enable KSM for tests clusterloader2/testing/load: Adjust kube-state-metrics latency tests Mar 24, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 24, 2021
@lilic
Copy link
Member Author

lilic commented Mar 24, 2021

/hold cancel

@wojtek-t do you might having a look, thank you!

I will do the refactor you suggested in a separate PR if that's okay for you?

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 24, 2021
@@ -33,6 +33,9 @@ type InitOptions struct {

// Features represents all features supported by this provider.
type Features struct {
// KubeStateMetricsTestDisabled determines if running kube-state-metrics is supported.
KubeStateMetricsTestDisabled bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're trying to be positive in those (e.g. talk about features not about disabling something).

So instead of that, we should make it more like:

SupportKubeStateMetrics

Given that our tests run on GCE, it's probably enough to enable that just for gce cloud provider for now.

@lilic lilic force-pushed the enable-ksm branch 2 times, most recently from 29c1a3b to b70877a Compare March 25, 2021 08:44
Currently enable KubeStateMetrics for GCE env only.

As we use kubeconfig for kube-state-metrics and this is not supported in
kubemark we cannot run those for kubemark env.
@wojtek-t
Copy link
Member

/lgtm
/approve

Thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 25, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lilic, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 25, 2021
@k8s-ci-robot k8s-ci-robot merged commit 05ada9b into kubernetes:master Mar 25, 2021
@lilic lilic deleted the enable-ksm branch March 25, 2021 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants