Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(otlp-transformer): avoid precision loss when converting HrTime to unix nanoseconds #4062

Merged
merged 16 commits into from
Sep 28, 2023

Conversation

seemk
Copy link
Contributor

@seemk seemk commented Aug 15, 2023

Which problem is this PR solving?

Fixes #2643

Short description of the changes

Fixes the precision loss when converting HrTime to a singular number. The precision loss happened due to hrTimeToNanoseconds first multiplying seconds by nanosecond digits, causing the initial number to overflow 53 bits, after which it adds nanoseconds to the number, causing additional errors.

Instead of using the long.js package, I copied over the relevant parts from this library. long.js does not support CommonJS, only ESM and it seems enabling esModuleInterop is not recommended / accepted.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • Followed the style guidelines of this project
  • Unit tests have been added
  • Documentation has been updated

@seemk seemk requested a review from a team August 15, 2023 11:13
@dyladan
Copy link
Member

dyladan commented Aug 16, 2023

I think this may be a breaking change for the exporter interface? Maybe it's time to start seriously considering a 2.0 release of the SDK since we have quite a few improvements that would be good to make.

@seemk
Copy link
Contributor Author

seemk commented Aug 24, 2023

I think this may be a breaking change for the exporter interface? Maybe it's time to start seriously considering a 2.0 release of the SDK since we have quite a few improvements that would be good to make.

Since the breaking change is in otlp-transformer which is both experimental and used internally, I'd keep things as is with regards to this change.

Turns out the amount of tests that needed to be changed was greater than expected, still going over some of them and will try to cut some duplicate code.

@codecov
Copy link

codecov bot commented Aug 24, 2023

Codecov Report

Merging #4062 (b49e6be) into main (6bf1b78) will decrease coverage by 0.04%.
The diff coverage is 87.20%.

❗ Current head b49e6be differs from pull request most recent head 708349d. Consider uploading reports for the commit 708349d to get more accurate results

@@            Coverage Diff             @@
##             main    #4062      +/-   ##
==========================================
- Coverage   92.28%   92.24%   -0.04%     
==========================================
  Files         329      331       +2     
  Lines        9370     9453      +83     
  Branches     1991     1992       +1     
==========================================
+ Hits         8647     8720      +73     
- Misses        723      733      +10     
Files Coverage Δ
...ntal/packages/otlp-transformer/src/common/index.ts 100.00% <100.00%> (ø)
...mental/packages/otlp-transformer/src/logs/index.ts 100.00% <100.00%> (ø)
...mental/packages/otlp-transformer/src/logs/types.ts 100.00% <ø> (ø)
.../packages/otlp-transformer/src/metrics/internal.ts 100.00% <100.00%> (ø)
...tal/packages/otlp-transformer/src/metrics/types.ts 100.00% <ø> (ø)
...al/packages/otlp-transformer/src/trace/internal.ts 96.77% <100.00%> (ø)
...ental/packages/otlp-transformer/src/trace/types.ts 100.00% <ø> (ø)
...kages/otlp-transformer/src/common/unsigned_long.ts 85.71% <85.71%> (ø)

... and 1 file with indirect coverage changes

@seemk
Copy link
Contributor Author

seemk commented Aug 25, 2023

Additionally:

Copy link
Member

@pichlermarc pichlermarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may be a breaking change for the exporter interface? Maybe it's time to start seriously considering a 2.0 release of the SDK since we have quite a few improvements that would be good to make.

Since the breaking change is in otlp-transformer which is both experimental and used internally, I'd keep things as is with regards to this change.

I agree, this change should only be a problem to direct users of @opentelemetry/otlp-transformer, and it's documented that this package is only intended for internal use. The exporters are pinned to the exact version, so it's fair to assume that this won't break actual end-users. 🙂

Turns out the amount of tests that needed to be changed was greater than expected, still going over some of them and will try to cut some duplicate code.

Yes, this has been a pain point for a while. They have grown historically and really need some refactoring. 😞

Additionally:

* Removed a bit of dead code in the tests.

* Fixed [opentelemetry-exporter-metrics-otlp-proto/test/OTLPMetricExporter.test.ts](https://github.com/open-telemetry/opentelemetry-js/pull/4062/files#diff-712a67b7aa2e82726dce05de9454b18060ed032c83e82a6e9febaa44cba7b1a4) and [opentelemetry-exporter-metrics-otlp-grpc/test/OTLPMetricExporter.test.ts](https://github.com/open-telemetry/opentelemetry-js/pull/4062/files#diff-027af27e5e7182406fd3a331263372057498022868949b788f2349fd77329ef5) to actually test against the ground truth. It was comparing the parsed timestamps against the parsed timestamps. The data was decoded and then cast to a mismatching type, in the case of timestamps the actual parsed data had string timestamps, but this was never reflected in the OTLP types.

Thanks! 🚀 I've added one blocking comment about the code adapted from long.js - then I think this PR is good to go.
Waiting for approval by @dyladan before merging though to ensure all concerns are addressed 🙂

seemk and others added 2 commits August 30, 2023 22:01
@pichlermarc pichlermarc merged commit 513d067 into open-telemetry:main Sep 28, 2023
16 checks passed
@Ankcorn
Copy link

Ankcorn commented Oct 14, 2023

Hey @seemk , just a heads up this was a breaking change for us and ingesting open telemetry spans. It caused a little bit of confusion but we did figure out what was going on eventually 😅

We have our own custom receptor written in typescript

Are we doing something wrong/unsafe and is there an official package for deserializing spans?

@seemk
Copy link
Contributor Author

seemk commented Oct 15, 2023

Hey @seemk , just a heads up this was a breaking change for us and ingesting open telemetry spans. It caused a little bit of confusion but we did figure out what was going on eventually 😅

We have our own custom receptor written in typescript

Are we doing something wrong/unsafe and is there an official package for deserializing spans?

What does the custom receiver do? Do you have your own protobuf decoder?

@Ankcorn
Copy link

Ankcorn commented Oct 16, 2023

The custom receiver processes open telemetry data and sends it to kinesis. The majority of this data is just JSON although we do support protobuf. I only saw an issue parsing the JSON payloads. I don't believe protobuf was affected.

@karanssj4
Copy link

@seemk we faced a similar issue, timestamps started coming up as an object

hypertest-logger-consumer-1  |       startTimeUnixNano: {
hypertest-logger-consumer-1  |         low: 2022226944,
hypertest-logger-consumer-1  |         high: 395220561
hypertest-logger-consumer-1  |       },
hypertest-logger-consumer-1  |       endTimeUnixNano: {
hypertest-logger-consumer-1  |         low: 2023255286,
hypertest-logger-consumer-1  |         high: 395220561
hypertest-logger-consumer-1  |       }

hypertest-logger-consumer-1  | Argument `startTimeUnixNano`: Invalid value provided. Expected BigInt, provided Object.

@karanssj4
Copy link

@Ankcorn can confirm protobuf wasn't affected

@lingyan
Copy link

lingyan commented Oct 16, 2023

BTW, there is a detailed report of the breakage in this issue: #4202

@cristianmadularu
Copy link
Contributor

Hi guys, I'm not sure if this change was meant to have external visibility, but the timestamps being sent by a regular web application are also being sent at JSON objects now breaking certain providers ingressing this data (e.g.. Uptrace).
Please see #4216

@seemk seemk mentioned this pull request Oct 22, 2023
3 tasks
@sergeynechaev
Copy link

sergeynechaev commented Nov 5, 2023

Hi guys! I can confirm that this change also broke sending browser events to Honeycomb.io. Their API https://api.honeycomb.io/v1/traces expects string or number in the startTimeUnixNano and endTimeUnixNano, but gets the IFixed64 object instead and fails with the "failed to parse OTLP request body" error.

export interface IFixed64 {
    low: number;
    high: number;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Precision loss on time_unix_nano
8 participants