-
Notifications
You must be signed in to change notification settings - Fork 438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write last xds snapshot to persisted storage #6115
Comments
per discussion with @nrjpoddar and @kcbabo , the preferred long term solution is #6114 That change is larger and risker; in the meantime we will add this support (temporarily) and deprecate and remove it once the other feature is implemented and well tested in the field. |
@kdorosh we need to consider the implications of adding private keys - certificates when not using SDS - to a PV. They would like to have the persistence in HA Redis. This can be follow up work. |
@chrisgaun as noted earlier, the preferred long-term solution is #6114 so all state is stored in etcd. In the meantime, an encrypted volume (e.g. https://kubernetes.io/docs/concepts/storage/storage-classes/#aws-ebs) may be acceptable. We could explore HA redis, but that seems similar to making xds-relay HA which might be preferable, although #6114 is still preferred in my opinion |
related blocker i ran into while doing the work solo-io/solo-kit#461 |
related: #5022 steps to reproduce:
apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
name: default
namespace: gloo-system
spec:
virtualHost:
domains:
- '*'
routes:
- matchers:
- exact: /all-pets
options:
prefixRewrite: /api/pets
routeAction:
single:
upstream:
name: default-petstore-8080
namespace: gloo-system
- matchers:
- exact: /all-pets2
options:
prefixRewrite: /api/pets
routeAction:
single:
upstream:
name: default-petstore2-8080
namespace: gloo-system
---
apiVersion: v1
kind: Service
metadata:
name: petstore
namespace: default
labels:
service: petstore
spec:
ports:
- port: 8080
protocol: TCP
selector:
app: petstore
---
apiVersion: v1
kind: Service
metadata:
name: petstore2
namespace: default
labels:
service: petstore
spec:
ports:
- port: 8080
protocol: TCP
selector:
app: petstore
---
apiVersion: gloo.solo.io/v1
kind: Upstream
metadata:
name: default-petstore-8080
namespace: gloo-system
spec:
kube:
selector:
app: petstore
serviceName: petstore
serviceNamespace: default
servicePort: 8080
---
apiVersion: gloo.solo.io/v1
kind: Upstream
metadata:
name: default-petstore2-8080
namespace: gloo-system
spec:
kube:
selector:
app: petstore
serviceName: petstore2
serviceNamespace: default
servicePort: 8080
{"level":"warn","ts":"2022-04-07T14:55:55.263Z","logger":"gloo-ee.v1.event_loop.setup.gloosnapshot.event_loop.reporter","caller":"reporter/reporter.go:255","msg":"failed to write status state:Warning reason:\"warning: \\n 1 error occurred:\\n\\t* Upstream name:\\\"default-petstore2-8080\\\" namespace:\\\"gloo-system\\\" references the service \\\"petstore2\\\" which does not exist in namespace \\\"default\\\"\\n\\n\" reported_by:\"gloo\" for resource default-petstore2-8080: updating kube resource default-petstore2-8080:112756 (want 112756): admission webhook \"gateway.gloo-system.svc\" denied the request: resource incompatible with current Gloo snapshot: [Validating v1.Upstream failed: 1 error occurred:\n\t* Upstream name:\"default-petstore2-8080\" namespace:\"gloo-system\" references the service \"petstore2\" which does not exist in namespace \"default\"\n\n]","version":"1.11.0-beta7"}
{"level":"error","ts":"2022-04-07T14:55:55.265Z","logger":"gloo-ee.v1.event_loop.setup","caller":"setup/setup_syncer.go:668","msg":"gloo main event loop","version":"1.11.0-beta7","error":"event_loop.gloo: 1 error occurred:\n\t* writing reports: 1 error occurred:\n\t* failed to write status state:Warning reason:\"warning: \\n 1 error occurred:\\n\\t* Upstream name:\\\"default-petstore2-8080\\\" namespace:\\\"gloo-system\\\" references the service \\\"petstore2\\\" which does not exist in namespace \\\"default\\\"\\n\\n\" reported_by:\"gloo\" for resource default-petstore2-8080: updating kube resource default-petstore2-8080:112756 (want 112756): admission webhook \"gateway.gloo-system.svc\" denied the request: resource incompatible with current Gloo snapshot: [Validating v1.Upstream failed: 1 error occurred:\n\t* Upstream name:\"default-petstore2-8080\" namespace:\"gloo-system\" references the service \"petstore2\" which does not exist in namespace \"default\"\n\n]\n\n\n\n","errorVerbose":"1 error occurred:\n\t* writing reports: 1 error occurred:\n\t* failed to write status state:Warning reason:\"warning: \\n 1 error occurred:\\n\\t* Upstream name:\\\"default-petstore2-8080\\\" namespace:\\\"gloo-system\\\" references the service \\\"petstore2\\\" which does not exist in namespace \\\"default\\\"\\n\\n\" reported_by:\"gloo\" for resource default-petstore2-8080: updating kube resource default-petstore2-8080:112756 (want 112756): admission webhook \"gateway.gloo-system.svc\" denied the request: resource incompatible with current Gloo snapshot: [Validating v1.Upstream failed: 1 error occurred:\n\t* Upstream name:\"default-petstore2-8080\" namespace:\"gloo-system\" references the service \"petstore2\" which does not exist in namespace \"default\"\n\n]\n\n\n\n\nevent_loop.gloo\ngithub.com/solo-io/go-utils/errutils.AggregateErrs\n\t/go/pkg/mod/github.com/solo-io/go-utils@v0.21.24/errutils/aggregate_errs.go:19\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371","stacktrace":"github.com/solo-io/gloo/projects/gloo/pkg/syncer/setup.RunGlooWithExtensions.func6\n\t/go/pkg/mod/github.com/solo-io/gloo@v1.11.0-beta11/projects/gloo/pkg/syncer/setup/setup_syncer.go:668"} update: if we don't run the route replacement sanitizer then we don't have this issue
update: old config is also stuck, e.g. after deleting the service but before rolling pods apply: apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
name: default
namespace: gloo-system
spec:
virtualHost:
domains:
- '*'
routes:
- matchers:
- exact: /all-pets
options:
prefixRewrite: /api/pets
routeAction:
single:
upstream:
name: default-petstore-8080
namespace: gloo-system
- matchers:
- exact: /all-pets2
options:
prefixRewrite: /api/pets
routeAction:
single:
upstream:
name: default-petstore2-8080
namespace: gloo-system
- matchers:
- exact: /all-pets3
options:
prefixRewrite: /api/pets
routeAction:
single:
upstream:
name: default-petstore3-8080
namespace: gloo-system
---
apiVersion: v1
kind: Service
metadata:
name: petstore3
namespace: default
labels:
service: petstore
spec:
ports:
- port: 8080
protocol: TCP
selector:
app: petstore
---
apiVersion: gloo.solo.io/v1
kind: Upstream
metadata:
name: default-petstore3-8080
namespace: gloo-system
spec:
kube:
selector:
app: petstore
serviceName: petstore3
serviceNamespace: default
servicePort: 8080 |
also highly relevant to the initial ask here of persisting xds config; this is/was made much harder because we did not do this https://bryanftan.medium.com/accept-interfaces-return-structs-in-go-d4cab29a301b we may want to investigate a refactor to make the implementation more future-proof |
This may still be desirable to do depending on how hard it is to rewrite gateway translation to never fail once the gloo and gateway pods merge; fyi @elcasteel @sam-heilbron @nfuden the code I wrote has been pushed to these branches |
This issue has been marked as stale because of no activity in the last 180 days. It will be closed in the next 180 days unless it is tagged "no stalebot" or other activity occurs. |
Version
No response
Is your feature request related to a problem? Please describe.
we need to ensure Ensure Gloo Edge is reliable across all pod restarts and invalid configuration
the shortest path to ensure reliable xds configuration is always being served is to write last acked xds snapshot to persistent storage, and load that if gloo translation is unable to complete (related #6114)
Describe the solution you'd like
alternative solution: write last xds cache to persistent storage. Size M, Risk M
Describe alternatives you've considered
No response
Additional Context
downside: breaks multitenancy, which may or may not be a product requirement in all gloo settings deployments
The text was updated successfully, but these errors were encountered: