-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Openapi parsing performance improvements #4569
Comments
cc @KnVerey |
Chiming in with my tests :) Version 4.5.5 has much better performance than some older versions mentioned in the previous tickets on this matter (5-7 times the speed, I would say, compared to versions around 3.9.4 or so), but it's still two times slower than 3.5.4. I've run a quick comparison on our manifests directory, containing a total of 460 kustomization.yaml files:
Just like before, it slows down when it encounters "unknown" resource kinds, such as SealedSecret, or actually any random string. Interestingly enough, however, per my tests, v4.5.5 is about 2.5 times faster than v3.5.4, if only known resource kinds are used: I performed a quick test on a simple set of manifests containing only a Deployment and a Service. But once I add a SealedSecret there (still with v4.5.5), it becomes 5 times slower. The 2.5x performance improvement over v3.5.4 is awesome, I wonder if it's possible to retain it for the case where arbitrary resource kinds are present as well. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale We still need this part:
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale this won't just go away if this ticket is closed automatically. |
A gentle reminder that the issue is still there. In our case, this is a definite blocker preventing us from upgrading from approximately v3.5.4 to anything more recent. v3.5.4, 17.8s:
v5.0.0, the latest at the time of writing this, 28.5s, which is 1.6 times slower than v3.5.4:
There was a big performance improvement at some point after v3.5.4 in processing the resources of "known" Kinds (e.g., Deployment, Service, etc.), and it is a pity that it remains useless, because it gets totally ruined by the poor performance of processing "unknown" Kinds such as e.g. SealedSecret. |
@shapirus thank you for posting your results with the lastest version |
Sure, I will, tomorrow. |
@ephesused here you go. Attached is a set of manifests that demonstrate the problem and a Dockerfile (written for linux/x86_64; for other OS/arch you'll need to adjust the package manager commands and the kustomize url template) to build and run the tests conveniently. The list of versions to test is set in In a single command:
|
Thanks, @shapirus. I hope to be able to start investigating early next week. I'll post here as I go. |
TL;DR: I plan to submit a PR to adjust something simple (gvk.ApiVersion) that should help. However, the PR does not address the issue of openapi parsing, so this issue will remain open past the initial PR. I dug around with some profiling to see what I could uncover. As a start, it's worth noting that the test case is set up with only a single resource. That highlights the openapi parsing issue well, but (from what I can tell) the openapi parsing is a one-time expense. So the test case effectively is a worst-case scenario for the openapi side of things. That's fine, but I think it may put the focus on a misleading source of the issue. To get execution in a range where I could profile with a little bit of confidence, I adjusted the test cases so they have 2000 resources (each identical to the original but with a different name). The known full run uses a total of 570MB. The unknown full run uses a total of 1219MB, a difference of 649MB. The samples for memory were over 94% of the total in both cases, so the sampling is a good representation. The known gvk is a v1 Secret, the unknown one is a bitnami.com/v1alpha1 SealedSecret. The known run uses 121MB for resid.NewGVK. The unknown one uses 740MB for resid.NewGVK. That's a difference of 619MB, accounting for the vast majority of the difference for the full run. For the known, all of the 121MB comes through strings.WriteString building the apiVersion ("v1"). For the unknown, 699MB of the 739MB comes through strings.WriteString building the apiVersion ("bitnami.com/v1alpha1"). This amount expands and contracts based on the number of resources. The other 41MB comes through openapi.initSchema. The openapi.initSchema amount doesn't seem to be affected by the resource count. resource.CurId (resource.go line 447) calls resource.GetGvk (resource.go line 56) which calls resid.GvkFromNode (gvk.go line 31) which calls...
The known full run took 5.79s. The unknown full run took 7.05s, a difference of 1.26s. The sampling for cpu time was about 90% for both. For what was sampled, the known run used 2.99s for resource.CurId. The unknown run used 3.85s for resource.CurId, a difference of 0.86s. The unknown run used 0.18s more for garbage collection (runtime.gcBcMarkWorker). The unknown run used 0.48s more for openapi.IsNamespaceScoped, 0.32s of which was openapi.initSchema. Currently gvk.ApiVersion uses strings.Builder to create its output. Since the string construction is very simple, I don't think strings.Builder is the best approach. When I created a benchmark to test its current execution, here's what happens for bitnami.com/v1alpha1:
With the following gvk.ApiVersion code, the execution runs in 1/3 the time and avoids allocations: if x.Group != "" {
return x.Group + "/" + x.Version
}
return x.Version
So my thinking is that the first step is to adjust gvk.ApiVersion. That should make its execution quicker and should reduce load on garbage collection - which should help overall runtime. Profile resultsKnown use case, CPUUnknown use case, CPUKnown use case, memoryUnknown use case, memory |
A note regarding this:
it is actually a close representation of our use case: hundreds of small sets of resources, each having its own |
I follow, @shapirus. To be clear, I don't intend to stop with the initial PR. That memory use jumped out at me when looking at the profile results, and it's a quick thing to address. I figured I might as well get that in place, with the hopes that it might improve things on your end at least somewhat. |
Yeah sure. It looks like it's going to be an iterative process either way. Good thing is that we know how to reproduce it, so we have a way of measuring exactly how much of a difference a particular change makes. |
FWIW... In my testing with @shapirus's test scenario, #5808 yields about a 10% runtime improvement over v5.0.0 for both known and unknown resources. |
Sorry, other commitments have claimed my time recently. I did do some further analysis, but haven't had time to dig and come up with ideas for improvement. I hope to have more time later this week. I set up a test case that ran 100 executions of the one-resource known & unknown test cases (resetting the global "already initialized" flag each time, to mimic the user experience of running kustomize fresh each time). Below, kn stands for known and uk stands for unknown. The cpu run-time results for unknown resources are more than three times that of the known resources. That's meaningfully worse than what @shapirus noted previously in this issue. The memory differences might look surprising at first, but they are what I expected based on earlier analysis (each openapi initialization requires a little more than 40MB).
|
After digging around, I have a possible approach in mind. The work done in #4152 ensures the precomputed namespace scoped information aligns with the default schema. The implication is that anything not found in the precomputed information is not in the default schema. Knowing that, there's no point in loading the default schema when trying to determine namespace scoping. In that case, if it's not precomputed, we know that the typeMeta won't be found and that openapi.isNamespaceScopedFromSchema must return false, false. So the idea is to put a short-circuit in |
Just a quick note to say that while I expect this PR to be small, I haven't been able to carve the time to do the coding. I intend to do so within the next few days. |
...lots of failing checks on that PR :) |
Indeed. It turned out to be more complex than I'd expected, and I haven't had a chance to come back around and see if/how I can make it work properly. (I try to ensure things are in solid shape before submission, but a challenge for me is that my dev environment doesn't align well with the suite of kustomize tests. I'm unable to run the full test suite cleanly. That can make it hard to distinguish test failures that are due to my mistakes from those that are expected due to my environment.) |
@ephesused since PR #5076 has finally been merged, I decided to run my tests again. Unfortunately, there has been no release with that PR yet, so I had to build current master branch (2f99707) to test it. It looks awesome. Not only the slow parsing of unknown kinds is fixed, but it is now 4.7 times faster than kustomize v3.5.4 (the release that I'm still stuck with because of those performance issues) for both known and unknown kinds! (this is all x86_64, haven't tested it on arm64 yet)
As far as my use case is concerned, I consider the issue resolved at this point, thank you very much @ephesused. p.s. any idea when 5.1.2 may be expected to be released? It's been a while since 5.1.1 came out. |
5.2.0 has just been released, and I can confirm that it is still as fast as the revision I tested in #4569 (comment). |
This issue has not been updated in over 1 year, and should be re-triaged. You can:
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/ /remove-triage accepted |
OpenAPI parsing is a huge performance bottleneck for us. Related issues:
We discovered with these benchmarks that parsing protobuf is much faster than parsing JSON OpenAPI schemas. kube-openapi now also supports a way to convert from the
Gnostic v2
types tospec.Swagger
types.This means that instead of parsing the JSON to
spec.Swagger
types, which is what we do today, we can instead parse a proto schema toGnostic v2
types, and then convert theGnostic v2
tospec.Swagger
.Proto->gnostic-swagger takes ~39 ms.
Parsing JSON takes ~600 ms.
Proposal
We should use protobuffer format instead of JSON for our builtin OpenAPI. openapi parsing performance improvement with protobuffer #4568 is a complete implementation of this.
Right now, if users have their own custom schema files, they provide them via the
openapi
field. However, the custom schema replaces the builtin schema entirely, so the user has to provide the entire schema themselves. Since they will typically provide the custom schema in JSON or YAML, they will not get the performance improvements of the builtin proto schema. To fix this, we can have the custom schema provided via theopenapi
field be supplementary to what is builtin, rather than replacing it completely. In this workflow, the users get two benefits:--proto
option tokustomize openapi fetch
and also allow proto-formatted schemas in theopenapi
field. This would look something like:The text was updated successfully, but these errors were encountered: