Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gateway proxy fails to start with invalid route in a virtual service #4438

Closed
mattchrist opened this issue Mar 16, 2021 · 8 comments
Closed
Assignees
Labels
Impact: L Size: M 3 - 5 days Type: Bug Something isn't working

Comments

@mattchrist
Copy link

mattchrist commented Mar 16, 2021

Describe the bug
Gateway proxy fails to start with invalid route in a virtual service after gloo pod restarts

To Reproduce
Steps to reproduce the behavior:

  1. have a working deployment of gloo 1.6.14
  2. have an invalidConfigPolicy in gloo settings
...
    gateway:
      validation:
        allowWarnings: true
        alwaysAccept: false
        proxyValidationServerAddr: gloo:9988
    gloo:
      invalidConfigPolicy:
        invalidRouteResponseBody: Gloo Gateway has invalid configuration. Administrators
          should run glooctl check to find and fix config errors.
        invalidRouteResponseCode: 404
        replaceInvalidRoutes: true
...
  1. Create a virtual service with an invalid route
apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
  name: "wontwork"
  namespace: "gloo-system"
spec:
  virtualHost:
    domains:
      - "wontwork"
    routes:
      - matchers:
          - prefix: /
        routeAction:
          upstreamGroup:
            name: "wontwork"
            namespace: "apps"

(Note: There is no upstream group named "wontwork" in the "apps" namespace.)

This resource is identified as invalid, it has the following status:

status:
  reason: "warning: \n  Route Warning: InvalidDestinationWarning. Reason: *v1.UpstreamGroup
    { apps.wontwork } not found\nRoute Warning: InvalidDestinationWarning. Reason:
    *v1.UpstreamGroup { apps.wontwork } not found"
  1. restart the gloo deployment
  2. restart the gateway-proxy deployment

Expected behavior
The new gateway proxy pod starts and becomes healthy, and is able to proxy traffic, except for the virtual service with an invalid route

Actual behavior
The new gateway-proxy pods in the new replica set never become healthy and never get configured by the Gloo service.
The new gateway-proxy pod has log entries indicating it can't communicate with Gloo

...
[2021-03-16 20:23:52.718][6][info][config] [external/envoy/source/server/configuration_impl.cc:125] loading tracing configuration
[2021-03-16 20:23:52.718][6][info][config] [external/envoy/source/server/configuration_impl.cc:85] loading 0 static secret(s)
[2021-03-16 20:23:52.718][6][info][config] [external/envoy/source/server/configuration_impl.cc:91] loading 4 cluster(s)
[2021-03-16 20:23:52.721][6][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:101] StreamAggregatedResources gRPC config stream closed: 14, Cluster not available
[2021-03-16 20:23:52.721][6][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:63] Unable to establish new stream
[2021-03-16 20:23:52.721][6][info][config] [external/envoy/source/server/configuration_impl.cc:95] loading 1 listener(s)
[2021-03-16 20:23:52.725][6][info][config] [external/envoy/source/server/configuration_impl.cc:107] loading stats configuration
[2021-03-16 20:23:52.725][6][info][main] [external/envoy/source/server/server.cc:737] starting main dispatch loop
[2021-03-16 20:23:52.728][6][info][runtime] [external/envoy/source/common/runtime/runtime_impl.cc:425] RTDS has finished initialization
[2021-03-16 20:23:52.728][6][info][upstream] [external/envoy/source/common/upstream/cluster_manager_impl.cc:187] cm init: initializing cds
[2021-03-16 20:24:07.727][6][info][upstream] [external/envoy/source/common/upstream/cluster_manager_impl.cc:191] cm init: all clusters initialized
[2021-03-16 20:24:07.728][6][info][main] [external/envoy/source/server/server.cc:718] all clusters initialized. initializing init manager
[2021-03-16 20:24:22.727][6][info][config] [external/envoy/source/server/listener_manager_impl.cc:888] all dependencies initialized. starting workers

The gloo pod logs some issues:

{"level":"warn","ts":1615926587.4339924,"logger":"gloo.v1.event_loop.setup.v1.event_loop.envoyTranslatorSyncer","caller":"syncer/envoy_translator_syncer.go:133","msg":"Proxy had invalid config","version":"1.6.14","proxy":"name:\"gateway-proxy\"  namespace:\"gloo-system\"","error":"3 errors occurred:\n\t* invalid resource gloo-system.gateway-proxy\n\t* upstream group not found, (Name: wontwork, Namespace: apps)\n\t* WARN: \n  [Route Warning: InvalidDestinationWarning. Reason: *v1.UpstreamGroup { apps.wontwork } not found Route Warning: InvalidDestinationWarning. Reason: *v1.UpstreamGroup { apps.wontwork } not found]\n\n"}
{"level":"warn","ts":1615926587.4357145,"logger":"gloo.v1.event_loop.setup.v1.event_loop.envoyTranslatorSyncer","caller":"syncer/envoy_translator_syncer.go:142","msg":"proxy gloo-system.gateway-proxy was rejected due to invalid config: 2 errors occurred:\n\t* invalid resource gloo-system.gateway-proxy\n\t* upstream group not found, (Name: wontwork, Namespace: apps)\n\n\nAttempting to update only EDS information","version":"1.6.14"}
{"level":"warn","ts":1615926587.4358199,"logger":"gloo.v1.event_loop.setup.v1.event_loop.envoyTranslatorSyncer","caller":"syncer/envoy_translator_syncer.go:149","msg":"endpoint update failed. xDS snapshot for proxy gloo-system.gateway-proxy will not be updated. Error is: no snapshot found for node gloo-system~gateway-proxy","version":"1.6.14"}

Additional context
Add any other context about the problem here, e.g.

  • Gloo Edge version: 1.6.14
  • Kubernetes version: 1.15
@sam-heilbron
Copy link
Contributor

Hi @mattchrist the error you're seeing was actually resolved with #4345, and was released with v1.6.16 (https://github.com/solo-io/gloo/releases/tag/v1.6.16). Can you upgrade to that version that has the fix and let us know if the problem persists?

@kdorosh
Copy link
Contributor

kdorosh commented Mar 30, 2021

Hi @mattchrist , any updates for us here? I believe enabling route replacement here would help as well.

@mattchrist
Copy link
Author

Hi @kdorosh, thanks for reaching out. I haven't had an opportunity to test v1.6.16 yet. I'll comment back here when I get a chance to test it out! Thanks!

@mattchrist
Copy link
Author

I'm still experiencing this same issue with both v1.6.16 and v1.7.0.

@sam-heilbron
Copy link
Contributor

Target gloo version: 1.6

@sam-heilbron sam-heilbron added Size: M 3 - 5 days and removed Size: TBD labels Aug 30, 2021
@jenshu jenshu self-assigned this Sep 7, 2021
@jenshu
Copy link
Contributor

jenshu commented Sep 8, 2021

moving this to blocked, as it may get fixed by #5022

@jenshu
Copy link
Contributor

jenshu commented Sep 9, 2021

Just my observations, there may be a bug somewhere, notably if you restart the deployments. Tested this on v1.6.37 with the following steps:

  1. modify settings with invalidConfigPolicy as mentioned above
  2. create a VS with a good route, and make sure curl works on that route
  3. create a VS with a bad route
  4. confirm can still curl good route successfully
  5. restart the gloo and gateway-proxy deployments
  6. now i'm unable to curl the good route anymore

note, when i created a single VS with one good route and one bad route (instead of 2 separate VS), the curl on the good route still worked after restarting the deployments

@mattchrist
Copy link
Author

This appears to have been fixed for us with 1.8.15

https://github.com/solo-io/gloo/releases/tag/v1.8.15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Impact: L Size: M 3 - 5 days Type: Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants