Openapi parsing performance improvements #4569

natasha41575 · 2022-04-05T00:08:13Z

OpenAPI parsing is a huge performance bottleneck for us. Related issues:

We discovered with these benchmarks that parsing protobuf is much faster than parsing JSON OpenAPI schemas. kube-openapi now also supports a way to convert from the Gnostic v2 types to spec.Swagger types.

This means that instead of parsing the JSON to spec.Swagger types, which is what we do today, we can instead parse a proto schema to Gnostic v2 types, and then convert the Gnostic v2 to spec.Swagger.

Proto->gnostic-swagger takes ~39 ms.
Parsing JSON takes ~600 ms.

Proposal

We should use protobuffer format instead of JSON for our builtin OpenAPI. openapi parsing performance improvement with protobuffer #4568 is a complete implementation of this.
Right now, if users have their own custom schema files, they provide them via the openapi field. However, the custom schema replaces the builtin schema entirely, so the user has to provide the entire schema themselves. Since they will typically provide the custom schema in JSON or YAML, they will not get the performance improvements of the builtin proto schema. To fix this, we can have the custom schema provided via the openapi field be supplementary to what is builtin, rather than replacing it completely. In this workflow, the users get two benefits:

they will only have to provide schemas for their custom types
they will still get the huge performance boost of kustomize's builtin openapi in proto format even when they have custom resource schemas defined in JSON

We should add a --proto option to kustomize openapi fetch and also allow proto-formatted schemas in the openapi field. This would look something like:

openapi:
  path: my-proto.pb

The text was updated successfully, but these errors were encountered:

natasha41575 · 2022-04-05T00:08:43Z

cc @KnVerey

shapirus · 2022-06-10T20:27:26Z

Chiming in with my tests :)

Version 4.5.5 has much better performance than some older versions mentioned in the previous tickets on this matter (5-7 times the speed, I would say, compared to versions around 3.9.4 or so), but it's still two times slower than 3.5.4. I've run a quick comparison on our manifests directory, containing a total of 460 kustomization.yaml files:

$ time (find . -type f -name kustomization.yaml -execdir kustomize-v3.5.4 build \; &>/dev/null)

real    0m16.606s
user    0m24.730s
sys     0m3.704s

$ time (find . -type f -name kustomization.yaml -execdir kustomize-v4.5.5 build \; &>/dev/null)

real    0m31.513s
user    0m58.568s
sys     0m7.938s

Just like before, it slows down when it encounters "unknown" resource kinds, such as SealedSecret, or actually any random string.

Interestingly enough, however, per my tests, v4.5.5 is about 2.5 times faster than v3.5.4, if only known resource kinds are used: I performed a quick test on a simple set of manifests containing only a Deployment and a Service.

But once I add a SealedSecret there (still with v4.5.5), it becomes 5 times slower.

The 2.5x performance improvement over v3.5.4 is awesome, I wonder if it's possible to retain it for the case where arbitrary resource kinds are present as well.

k8s-triage-robot · 2022-09-08T21:23:05Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

KnVerey · 2022-09-12T23:08:50Z

/remove-lifecycle stale

We still need this part:

We should add a --proto option to kustomize openapi fetch and also allow proto-formatted schemas in the openapi field.

k8s-triage-robot · 2022-12-11T23:25:13Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

shapirus · 2022-12-12T00:54:03Z

/remove-lifecycle stale

this won't just go away if this ticket is closed automatically.

shapirus · 2023-02-15T13:27:41Z

A gentle reminder that the issue is still there. In our case, this is a definite blocker preventing us from upgrading from approximately v3.5.4 to anything more recent.

v3.5.4, 17.8s:

$ time (find . -type f -name kustomization.yaml -execdir kustomize-v3.5.4 build \; &>/dev/null)

real    0m17.773s
user    0m26.537s
sys     0m5.160s

v5.0.0, the latest at the time of writing this, 28.5s, which is 1.6 times slower than v3.5.4:

$ time (find . -type f -name kustomization.yaml -execdir kustomize-v5.0.0 build \; &>/dev/null)

real    0m28.547s
user    0m53.858s
sys     0m6.810s

There was a big performance improvement at some point after v3.5.4 in processing the resources of "known" Kinds (e.g., Deployment, Service, etc.), and it is a pity that it remains useless, because it gets totally ruined by the poor performance of processing "unknown" Kinds such as e.g. SealedSecret.

KnVerey · 2023-02-15T20:40:31Z

@shapirus thank you for posting your results with the lastest version
cc @ephesused -- perhaps this issue would interest you, as someone who has helped a lot with Kustomize performance recently

ephesused · 2023-02-15T21:05:38Z

Thanksk @KnVerey. I'm happy to take a look. I'll pick this up for investigation and if I don't think I'm in position to address it, I'll ask to hand it off.

@shapirus, if possible, will you please provide your test case or a simplified version that we can use for analysis? Thanks.

/assign

shapirus · 2023-02-15T23:22:27Z

Sure, I will, tomorrow.

shapirus · 2023-02-16T11:19:07Z

@ephesused here you go. Attached is a set of manifests that demonstrate the problem and a Dockerfile (written for linux/x86_64; for other OS/arch you'll need to adjust the package manager commands and the kustomize url template) to build and run the tests conveniently. The list of versions to test is set in Dockerfile:5. It runs 300 iterations per each combination of kustomize versions and "known" (Service) and "unknown" (SealedSecret) kinds and measures the time it takes.

In a single command:

docker run $(docker build -q .)

kustomize-performance-test.zip

ephesused · 2023-02-16T14:11:13Z

Thanks, @shapirus. I hope to be able to start investigating early next week. I'll post here as I go.

ephesused · 2023-02-21T03:19:21Z

TL;DR: I plan to submit a PR to adjust something simple (gvk.ApiVersion) that should help. However, the PR does not address the issue of openapi parsing, so this issue will remain open past the initial PR.

I dug around with some profiling to see what I could uncover.

As a start, it's worth noting that the test case is set up with only a single resource. That highlights the openapi parsing issue well, but (from what I can tell) the openapi parsing is a one-time expense. So the test case effectively is a worst-case scenario for the openapi side of things. That's fine, but I think it may put the focus on a misleading source of the issue.

To get execution in a range where I could profile with a little bit of confidence, I adjusted the test cases so they have 2000 resources (each identical to the original but with a different name).

The known full run uses a total of 570MB. The unknown full run uses a total of 1219MB, a difference of 649MB. The samples for memory were over 94% of the total in both cases, so the sampling is a good representation.

The known gvk is a v1 Secret, the unknown one is a bitnami.com/v1alpha1 SealedSecret.

The known run uses 121MB for resid.NewGVK. The unknown one uses 740MB for resid.NewGVK. That's a difference of 619MB, accounting for the vast majority of the difference for the full run.

For the known, all of the 121MB comes through strings.WriteString building the apiVersion ("v1").

For the unknown, 699MB of the 739MB comes through strings.WriteString building the apiVersion ("bitnami.com/v1alpha1"). This amount expands and contracts based on the number of resources. The other 41MB comes through openapi.initSchema. The openapi.initSchema amount doesn't seem to be affected by the resource count.

resource.CurId (resource.go line 447) calls resource.GetGvk (resource.go line 56) which calls resid.GvkFromNode (gvk.go line 31) which calls...

resid.NewGVK (gvk.go line 24) which calls...
- gvk.AsTypeMeta (gvk.go line 247) which calls...
  - gvk.ApiVersion (gvk.go line 124) which calls...
    - strings.WriteString
- openapi.IsCertainlyClusterScoped (openapi.go line 406) which calls...
  - openapi.IsNamespaceScoped (openapi.go line 389) - the "v1" is recognized as a precomputed known type and returns; the "bitnami.com/v1alpha1" is not, which leads to a call into...
    - openapi.isNamespaceScopedFromSchema (openapi.go line 396) which calls...
      - openapi.initSchema (openapi.go line 625)

The known full run took 5.79s. The unknown full run took 7.05s, a difference of 1.26s. The sampling for cpu time was about 90% for both.

For what was sampled, the known run used 2.99s for resource.CurId. The unknown run used 3.85s for resource.CurId, a difference of 0.86s. The unknown run used 0.18s more for garbage collection (runtime.gcBcMarkWorker). The unknown run used 0.48s more for openapi.IsNamespaceScoped, 0.32s of which was openapi.initSchema.

Currently gvk.ApiVersion uses strings.Builder to create its output. Since the string construction is very simple, I don't think strings.Builder is the best approach. When I created a benchmark to test its current execution, here's what happens for bitnami.com/v1alpha1:

BenchmarkApiVersion/5-8         16658961                67.41 ns/op           48 B/op          2 allocs/op

With the following gvk.ApiVersion code, the execution runs in 1/3 the time and avoids allocations:

	if x.Group != "" {
		return x.Group + "/" + x.Version
	}
	return x.Version

BenchmarkApiVersion/5-8         57177163                21.71 ns/op            0 B/op          0 allocs/op

So my thinking is that the first step is to adjust gvk.ApiVersion. That should make its execution quicker and should reduce load on garbage collection - which should help overall runtime.

Profile results

Known use case, CPU

Unknown use case, CPU

Known use case, memory

Unknown use case, memory

shapirus · 2023-02-21T09:13:51Z

A note regarding this:

it's worth noting that the test case is set up with only a single resource.

it is actually a close representation of our use case: hundreds of small sets of resources, each having its own kustomization.yaml (microservice architecture where each microservice is additionally split into variants) for each of which a new invocation of kustomize is done. This is why this performance degradation became so very well noticeable in our situation.

ephesused · 2023-02-21T14:38:14Z

I follow, @shapirus. To be clear, I don't intend to stop with the initial PR. That memory use jumped out at me when looking at the profile results, and it's a quick thing to address. I figured I might as well get that in place, with the hopes that it might improve things on your end at least somewhat.

shapirus · 2023-02-21T15:12:14Z

Yeah sure. It looks like it's going to be an iterative process either way. Good thing is that we know how to reproduce it, so we have a way of measuring exactly how much of a difference a particular change makes.

ephesused · 2023-02-21T16:24:39Z

FWIW... In my testing with @shapirus's test scenario, #5808 yields about a 10% runtime improvement over v5.0.0 for both known and unknown resources.

ephesused · 2023-02-28T16:25:59Z

Sorry, other commitments have claimed my time recently. I did do some further analysis, but haven't had time to dig and come up with ideas for improvement. I hope to have more time later this week.

I set up a test case that ran 100 executions of the one-resource known & unknown test cases (resetting the global "already initialized" flag each time, to mimic the user experience of running kustomize fresh each time). Below, kn stands for known and uk stands for unknown.

The cpu run-time results for unknown resources are more than three times that of the known resources. That's meaningfully worse than what @shapirus noted previously in this issue. The memory differences might look surprising at first, but they are what I expected based on earlier analysis (each openapi initialization requires a little more than 40MB).

CPU
kn-cpu-master-976193ce7.svg: Duration: 2.22s, Total samples = 1.51s (67.88%)
uk-cpu-master-976193ce7.svg: Duration: 7.25s, Total samples = 6.76s (93.26%)

Memory
kn-mem-master-976193ce7.svg: Showing nodes accounting for 96.96MB, 65.62% of 147.76MB total
uk-mem-master-976193ce7.svg: Showing nodes accounting for 3.92GB, 88.04% of 4.45GB total

Known resource cpu usage

Unknown resource cpu usage

Known resource memory usage

Unknown resource memory usage

ephesused · 2023-03-02T03:15:35Z

After digging around, I have a possible approach in mind. The work done in #4152 ensures the precomputed namespace scoped information aligns with the default schema. The implication is that anything not found in the precomputed information is not in the default schema. Knowing that, there's no point in loading the default schema when trying to determine namespace scoping. In that case, if it's not precomputed, we know that the typeMeta won't be found and that openapi.isNamespaceScopedFromSchema must return false, false.

So the idea is to put a short-circuit in ~~isNamespaceScopedFromSchema~~ openapi.IsNamespaceScoped when the schema is the default.

ephesused · 2023-03-04T13:51:55Z

Just a quick note to say that while I expect this PR to be small, I haven't been able to carve the time to do the coding. I intend to do so within the next few days.

shapirus · 2023-04-11T10:54:37Z

...lots of failing checks on that PR :)

ephesused · 2023-04-11T14:47:58Z

...lots of failing checks on that PR :)

Indeed. It turned out to be more complex than I'd expected, and I haven't had a chance to come back around and see if/how I can make it work properly.

(I try to ensure things are in solid shape before submission, but a challenge for me is that my dev environment doesn't align well with the suite of kustomize tests. I'm unable to run the full test suite cleanly. That can make it hard to distinguish test failures that are due to my mistakes from those that are expected due to my environment.)

shapirus · 2023-10-14T10:14:46Z

@ephesused since PR #5076 has finally been merged, I decided to run my tests again. Unfortunately, there has been no release with that PR yet, so I had to build current master branch (2f99707) to test it.

It looks awesome. Not only the slow parsing of unknown kinds is fixed, but it is now 4.7 times faster than kustomize v3.5.4 (the release that I'm still stuck with because of those performance issues) for both known and unknown kinds!

(this is all x86_64, haven't tested it on arm64 yet)

################# STARTING TESTS
# kustomize versions: v3.5.4 v5.1.1 2f99707792228c921e2d477dbad7fc4898691365

## testing known resource kinds ##
kustomize v3.5.4, 1000 iterations
real    0m21.163s

kustomize v5.1.1, 1000 iterations
real    0m4.641s

kustomize 2f99707792228c921e2d477dbad7fc4898691365, 1000 iterations
real    0m4.468s


## testing unknown resource kinds ##
kustomize v3.5.4, 1000 iterations
real    0m21.101s

kustomize v5.1.1, 1000 iterations
real    0m57.450s

kustomize 2f99707792228c921e2d477dbad7fc4898691365, 1000 iterations
real    0m4.455s

As far as my use case is concerned, I consider the issue resolved at this point, thank you very much @ephesused.

p.s. any idea when 5.1.2 may be expected to be released? It's been a while since 5.1.1 came out.

shapirus · 2023-10-19T18:51:58Z

5.2.0 has just been released, and I can confirm that it is still as fast as the revision I tested in #4569 (comment).

k8s-triage-robot · 2024-10-18T19:22:52Z

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

natasha41575 added kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Apr 5, 2022

natasha41575 self-assigned this Apr 5, 2022

natasha41575 mentioned this issue Apr 5, 2022

openapi parsing performance improvement with protobuffer #4568

Merged

natasha41575 changed the title ~~Openapi parsing improvements~~ Openapi parsing performance improvements Apr 5, 2022

natasha41575 mentioned this issue Aug 23, 2022

OpenAPI field should support merging #4613

Open

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 8, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 12, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 11, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 12, 2022

k8s-ci-robot assigned ephesused Feb 15, 2023

ephesused mentioned this issue Feb 21, 2023

perf: optimize Gvk.ApiVersion() #5058

Merged

ephesused mentioned this issue Mar 6, 2023

perf: limit initSchema calls from openapi.IsNamespaceScoped #5076

Merged

shapirus mentioned this issue Oct 14, 2023

Performance degradation starting with v3.8.6 #4100

Closed

This was referenced Oct 26, 2023

Starting with 00f0fd71, performance degradation for strategic patch merges #2987

Closed

Researching performance degradation (and improvement) over a number of releases #5422

Open

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Openapi parsing performance improvements #4569

Openapi parsing performance improvements #4569

natasha41575 commented Apr 5, 2022 •

edited

Loading

natasha41575 commented Apr 5, 2022

shapirus commented Jun 10, 2022 •

edited

Loading

k8s-triage-robot commented Sep 8, 2022

KnVerey commented Sep 12, 2022

k8s-triage-robot commented Dec 11, 2022

shapirus commented Dec 12, 2022

shapirus commented Feb 15, 2023

KnVerey commented Feb 15, 2023

ephesused commented Feb 15, 2023

shapirus commented Feb 15, 2023

shapirus commented Feb 16, 2023 •

edited

Loading

ephesused commented Feb 16, 2023

ephesused commented Feb 21, 2023

shapirus commented Feb 21, 2023

ephesused commented Feb 21, 2023

shapirus commented Feb 21, 2023

ephesused commented Feb 21, 2023

ephesused commented Feb 28, 2023 •

edited

Loading

ephesused commented Mar 2, 2023 •

edited

Loading

ephesused commented Mar 4, 2023

shapirus commented Apr 11, 2023

ephesused commented Apr 11, 2023

shapirus commented Oct 14, 2023 •

edited

Loading

shapirus commented Oct 19, 2023

k8s-triage-robot commented Oct 18, 2024

Openapi parsing performance improvements #4569

Openapi parsing performance improvements #4569

Comments

natasha41575 commented Apr 5, 2022 • edited Loading

Proposal

natasha41575 commented Apr 5, 2022

shapirus commented Jun 10, 2022 • edited Loading

k8s-triage-robot commented Sep 8, 2022

KnVerey commented Sep 12, 2022

k8s-triage-robot commented Dec 11, 2022

shapirus commented Dec 12, 2022

shapirus commented Feb 15, 2023

KnVerey commented Feb 15, 2023

ephesused commented Feb 15, 2023

shapirus commented Feb 15, 2023

shapirus commented Feb 16, 2023 • edited Loading

ephesused commented Feb 16, 2023

ephesused commented Feb 21, 2023

Profile results

Known use case, CPU

Unknown use case, CPU

Known use case, memory

Unknown use case, memory

shapirus commented Feb 21, 2023

ephesused commented Feb 21, 2023

shapirus commented Feb 21, 2023

ephesused commented Feb 21, 2023

ephesused commented Feb 28, 2023 • edited Loading

ephesused commented Mar 2, 2023 • edited Loading

ephesused commented Mar 4, 2023

shapirus commented Apr 11, 2023

ephesused commented Apr 11, 2023

shapirus commented Oct 14, 2023 • edited Loading

shapirus commented Oct 19, 2023

k8s-triage-robot commented Oct 18, 2024

natasha41575 commented Apr 5, 2022 •

edited

Loading

shapirus commented Jun 10, 2022 •

edited

Loading

shapirus commented Feb 16, 2023 •

edited

Loading

ephesused commented Feb 28, 2023 •

edited

Loading

ephesused commented Mar 2, 2023 •

edited

Loading

shapirus commented Oct 14, 2023 •

edited

Loading