Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo environment generates errors by default #1628

Open
flands opened this issue Jun 23, 2024 · 9 comments
Open

Demo environment generates errors by default #1628

flands opened this issue Jun 23, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@flands
Copy link
Contributor

flands commented Jun 23, 2024

Bug Report

Which version of the demo you are using? 1.10.0

Symptom

If you start the demo environment from scratch, errors are reported for the adservice.

What is the expected behavior?

Either:

  1. The demo environment doesn't generate errors by default - currently how the documentation reads: https://opentelemetry.io/docs/demo/#scenarios
  2. The demo environment does generate errors by default but these errors are documented and thus expected.

What is the actual behavior?

The adservice generates errors by default yet the documentation seems to indicate you must enable scenarios to generate errors and other problems.

Reproduce

Provide the minimum required steps to result in the issue you're observing.

We will close this issue if:

  • The steps you provided are complex.
  • If we can not reproduce the behavior you're reporting.

Additional Context

Logs messages for adservice will show: ad-service | 2024-06-23 15:11:51 - oteldemo.AdService - GetAds Failed with status Status{code=UNAVAILABLE, description=null, cause=null} trace_id=d963f87608e1ab611dee31ef9ac29860 span_id=84ce83545d6852bb trace_flags=01

src/flagd/demo.flagd.json shows:

    "adServiceFailure": {
      "description": "Fail ad service",
      "state": "ENABLED",
      "variants": {
        "on": true,
        "off": false
      },
      "defaultVariant": "off",
      "targeting": {
        "fractional": [
          ["on", 10],
          ["off", 90]
        ]
      }
    },

The problem is off which should be set to 100 by default

@flands flands added the bug Something isn't working label Jun 23, 2024
@puckpuck
Copy link
Contributor

Cart service also needs to be updated. We should do them both in the same PR that follows a format noted in this comment

@julianocosta89
Copy link
Member

Actually I've tried the solution mentioned by @beeme1mr and everything broke.

recommendation-service   | grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
recommendation-service   | 	status = StatusCode.NOT_FOUND
recommendation-service   | 	details = "FlagdError:, FLAG_DISABLED"
recommendation-service   | 	debug_error_string = "UNKNOWN:Error received from peer ipv4:172.20.0.5:8013 {grpc_message:"FlagdError:, FLAG_DISABLED", grpc_status:5, created_time:"2024-06-29T03:51:25.86731025+00:00"}"
product-catalog-service  | 2024/06/29 03:51:25 openfeature: FLAG_NOT_FOUND: not_found: FlagdError:, FLAG_DISABLED

It seems that when disabling the Feature Flag, it doesn't return false, as expected.

flands added a commit to flands/opentelemetry-demo that referenced this issue Jun 30, 2024
Addresses open-telemetry#1628 and adds variability to errors for cart service
@beeme1mr
Copy link
Contributor

Hey @flands and @julianocosta89, I'll look into this tomorrow. When the flag is disabled, the SDK uses the default value defined in the code. The message @julianocosta89 posted is likely just an overly verbose log message.

@julianocosta89
Copy link
Member

@beeme1mr I think some services do not default to false 😞

@dyladan
Copy link
Member

dyladan commented Jul 1, 2024

@julianocosta89 I looked into this a bit and there are a few takeaways:

  1. For some reason in my environment (macos) changing the demo.flagd.json is not triggering changes to the flag definitions in flagd. Restarting the flagd service reflects the changes. This may be only affecting macos.
  2. If a flag is disabled, the python SDK is very verbose in its logs. The recommendationServiceCacheFailure flag does appear to be correctly falling back to its False default value. The logs are happening within the openfeature SDK.

I talked to @beeme1mr and he is looking into the verbose logging in the SDK. He agrees this situation isn't ideal and maybe shouldn't be logged the same way as other "real" failures. He's also going to look into why the flag file changes aren't being picked up by flagd in the demo setup.

@julianocosta89
Copy link
Member

Thanks for taking a look at it @dyladan!
Interesting enough when I update my feature flags it works fine, without having to restart the service.
I'm running on the demo on macOS M1

@dyladan
Copy link
Member

dyladan commented Jul 3, 2024

@julianocosta89 I was able to track the flagd reload issue down to my specific setup. Apparently in colima (the container runtime I'm using) the WRITE event is not triggered when I write a mounted file. You can probably ignore it for now.

@beeme1mr
Copy link
Contributor

beeme1mr commented Jul 3, 2024

I also have a quick update. We currently treat disabled flags like missing flags. That means we'll fall back to whatever is defined in the code, but we're also noisy about it because it's assumed you're accidentally using a flag. Obviously, that's not the ideal experience here, and we're working on a solution. It may take a few days to fully implement, but we're actively working on it and will provide an update ASAP.

@julianocosta89
Copy link
Member

Thanks for the updates @dyladan and @beeme1mr!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants