Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix otel-collector logging repeat error #748

Merged

Conversation

tiffanny29631
Copy link
Contributor

@tiffanny29631 tiffanny29631 commented Jul 17, 2023

The otel-collector pod log shows repeat entries of duplicate timeseries error that takes up logging quota. Although the metris are correctly being sent, the error message can be confusing to users.

The removal of type tag is causing one timeseries being processed multiple times in batch, both with and without the tag, which causes the above error.

This change uses aggregation in metricstransfrom processor to eliminate the type tag from Monarch pipeline. Format follows otel-collector-contrib 0.54.0.

b/290678742

pkg/metrics/otel.go Outdated Show resolved Hide resolved
@tiffanny29631 tiffanny29631 force-pushed the otel-log-error branch 4 times, most recently from 21226a4 to 2940f66 Compare July 31, 2023 18:27
@tiffanny29631 tiffanny29631 marked this pull request as ready for review July 31, 2023 18:28
@google-oss-prow google-oss-prow bot requested a review from sdowell July 31, 2023 18:28
@tiffanny29631 tiffanny29631 force-pushed the otel-log-error branch 3 times, most recently from 3645a97 to 5154183 Compare July 31, 2023 22:52
pkg/metrics/otel.go Outdated Show resolved Hide resolved
Copy link
Contributor

@sdowell sdowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close to LGTM, but a few more comments

e2e/testcases/otel_collector_test.go Outdated Show resolved Hide resolved
e2e/testcases/otel_collector_test.go Outdated Show resolved Hide resolved
e2e/testcases/otel_collector_test.go Outdated Show resolved Hide resolved
@tiffanny29631 tiffanny29631 force-pushed the otel-log-error branch 3 times, most recently from 1a4932a to 147ed2c Compare August 7, 2023 17:45
e2e/testcases/otel_collector_test.go Outdated Show resolved Hide resolved
e2e/testcases/otel_collector_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@sdowell sdowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple nits but mostly LGTM

e2e/testcases/otel_collector_test.go Outdated Show resolved Hide resolved
e2e/testcases/otel_collector_test.go Outdated Show resolved Hide resolved
@tiffanny29631 tiffanny29631 force-pushed the otel-log-error branch 2 times, most recently from c53b20e to c907f4b Compare August 8, 2023 22:35
@tiffanny29631
Copy link
Contributor Author

/retest-required

nt.T.Log("Restart otel-collector pod to reset the ConfigMap and log")
nomostest.DeletePodByLabel(nt, "app", ocmetrics.OpenTelemetry, false)
if err := nt.Watcher.WatchForCurrentStatus(kinds.Deployment(), ocmetrics.OtelCollectorName, ocmetrics.MonitoringNamespace); err != nil {
nt.T.Log(fmt.Sprintf("otel-collector pod failed to come up after a restart: %v", err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final nit: Replace nt.T.Log(fmt.Sprintf(... with nt.T.Errorf(.... Ditto below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any new changes, you may have forgotten to push :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to amend the commit, now good

The pod log shows repeat log of duplicate timeseries error that takes up logging quota. Although the metris are correctly being sent, the error message can be confusing to users.

The removal of `type` tag is causing one timeseries being exported multiple times in batch with and without the tag.

This change uses aggregation in metricstransfrom processor to eliminate the `type` tag from Monarch pipeline. Format follows otel-collector-contrib 0.54.0.

Tested locally e2e.

b/290678742
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sdowell

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tiffanny29631
Copy link
Contributor Author

/unhold

@google-oss-prow google-oss-prow bot merged commit 507f59a into GoogleContainerTools:main Aug 9, 2023
6 checks passed
@karlkfi
Copy link
Contributor

karlkfi commented Aug 9, 2023

This merged during code freeze... is it needed for v1.16.0?

@tiffanny29631
Copy link
Contributor Author

It is no longer targeting for 1.16. It was merged after the projected preview date, in theory the release candidate is confirmed by then.

tiffanny29631 added a commit to tiffanny29631/kpt-config-sync that referenced this pull request Aug 29, 2023
The pod log shows repeat log of duplicate timeseries error that takes up logging quota. Although the metris are correctly being sent, the error message can be confusing to users.

The removal of `type` tag is causing one timeseries being exported multiple times in batch with and without the tag.

This change uses aggregation in metricstransfrom processor to eliminate the `type` tag from Monarch pipeline. Format follows otel-collector-contrib 0.54.0.

Tested locally e2e.

b/290678742
google-oss-prow bot pushed a commit that referenced this pull request Aug 29, 2023
The pod log shows repeat log of duplicate timeseries error that takes up logging quota. Although the metris are correctly being sent, the error message can be confusing to users.

The removal of `type` tag is causing one timeseries being exported multiple times in batch with and without the tag.

This change uses aggregation in metricstransfrom processor to eliminate the `type` tag from Monarch pipeline. Format follows otel-collector-contrib 0.54.0.

Tested locally e2e.

b/290678742
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants