Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update datadog tracer config proto and preserve remote_config default behavior for envoy v1.31 (#10145) #10191

Merged
merged 16 commits into from
Oct 11, 2024

Conversation

andy-fong
Copy link
Contributor

@andy-fong andy-fong commented Oct 8, 2024

Description

API changes

Added fields from envoy 1.31:

  • collector_hostname
  • remote_config

For remote_config, we also added a disabled field which is not on the envoy side

Code changes

CI changes

None because this plugin only has unit test and does not have e2e test, the tests would pass even with old envoy (because we already imported the new go-control-plane some time ago that already has the new fields)

Docs changes

Context

Before envoy 1.31, the remote_config functionality is on by default and cannot be turned off. In order to maintain the behavior for potential customers that might rely on this, we enable the feature by default if it's
not explicitly disable. Customers who are using datadog tracer would not need to change anything when upgrading to gloo/gloo-ee 1.18.

slack-conversation: https://solo-io-corp.slack.com/archives/G01EERAK3KJ/p1728052383874909

Interesting decisions

Testing steps

Only rely on unit test on this one because we do not have a datadog setup to actually test this e2e.

Notes for reviewers

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works

@andy-fong andy-fong requested a review from a team as a code owner October 8, 2024 16:24
@github-actions github-actions bot added keep pr updated signals bulldozer to keep pr up to date with base branch work in progress signals bulldozer to keep pr open (don't auto-merge) labels Oct 8, 2024
Copy link

github-actions bot commented Oct 8, 2024

Visit the preview URL for this PR (updated for commit 7a5e4e1):

https://gloo-edge--pr10191-andyfong-datadog-pro-odct0tv2.web.app

(expires Fri, 18 Oct 2024 17:34:16 GMT)

🔥 via Firebase Hosting GitHub Action 🌎

Sign: 77c2b86e287749579b7ff9cadb81e099042ef677

@solo-changelog-bot
Copy link

Issues linked to changelog:
#10145

@@ -57,14 +57,13 @@ var _ = Describe("Local Rate Limit", func() {
expectNotRateLimitedWithOutXRateLimitHeader = func() {
response := expectNotRateLimited()
// Since the values of the x-rate-limit headers change with time, we only check the presence of these header keys and not match their value
ExpectWithOffset(1, response).ToNot(matchers.ContainHeaderKeys([]string{"x-ratelimit-reset", "x-ratelimit-limit", "x-ratelimit-remaining"}),
ExpectWithOffset(1, response).ToNot(matchers.ContainHeaderKeys([]string{"x-ratelimit-limit", "x-ratelimit-remaining"}),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

envoy 1.31 switch the local ratelimit implementation to AtomicBucket which is no longer timer based, so the x-ratelimit-reset is no longer a thing and will not send out that header anymore. Will call this out in docs upgrade faq and the way to revert back to the timer-based bucket implementation.

Copy link
Contributor

@sam-heilbron sam-heilbron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Confirmed in standup that this PR should not merge until a downstream PR to pull in this dependency is passing CI


remoteConfig := glooDatadogConfig.GetRemoteConfig()
if remoteConfig == nil {
// An empty RemoteConfig object on the envoy side means enabling RemoteConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a super helpful comment!

pkg/utils/api_conversion/trace.go Show resolved Hide resolved
google.protobuf.Duration polling_interval = 1;

// Disabled remote config.
// This field does not exist in envoy's config but allow us to preserve the default behavior
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Since this is our API, and we are exposing this specifically for backwards compatibility, should we clarify what the default behavior is?

Copy link
Contributor Author

@andy-fong andy-fong Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If have other updates I need to push, I will clarify that. However, in general, because all the proto field default to false, 0 or "". The implied default behavior is disabled=false. That's the reason to pick disabled as the field instead of enabled (In general, I really don't like anything that's double negative to mean positive coming from the C++ world but I was taught that in go and protobuf, rely on the default value instead.)

"x-ratelimit headers should not be present for non rate limited requests")
}

expectNotRateLimitedWithXRateLimitHeader = func() {
response := expectNotRateLimited()
// Since the x-ratelimit-reset header value changes with time, we only check the presence of this header key and not match its value
ExpectWithOffset(2, response).To(matchers.ContainHeaderKeys([]string{"x-ratelimit-reset", "x-ratelimit-limit", "x-ratelimit-remaining"}),
ExpectWithOffset(2, response).To(matchers.ContainHeaderKeys([]string{"x-ratelimit-limit", "x-ratelimit-remaining"}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the above comment about why the reset header will not be present is really useful, and I wonder if it could be even more effective if it were placed in code. By that I mean, what if we added an assertion to confirm that the reset header is not contained in the response headers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should put the comment back but just remove the x-ratelimit-reset part. In general, do we ever do tests with the runtime guard option because we will call out in the docs if customer needs this back, they can set the runtime guard setting (for the next 6 months before envoy remove it). So, technically, we are not testing a path that customer might need be using but that might be a whole new can of worms. It's the right things to do but not sure it's worth it as they will be throwaway code in 6 months.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I got 2 approval and the Enterprise side tests a mostly passing (re-running some flakes), I will probably add the comment in another PR without possibly a changelog only update to call out the runtime guard.

@andy-fong andy-fong removed the work in progress signals bulldozer to keep pr open (don't auto-merge) label Oct 11, 2024
@soloio-bulldozer soloio-bulldozer bot merged commit 73411a2 into main Oct 11, 2024
19 checks passed
@soloio-bulldozer soloio-bulldozer bot deleted the andyfong/datadog-proto-update branch October 11, 2024 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep pr updated signals bulldozer to keep pr up to date with base branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants