Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenTelemetry Protocol (OTLP) RFC #35

Merged

Conversation

tigrannajaryan
Copy link
Member

OpenTelemetry Protocol (OTLP) specification describes the encoding, transport and delivery mechanism of telemetry data between telemetry sources, intermediate nodes such as collectors and telemetry backends.

@tigrannajaryan
Copy link
Member Author

Admins, please assign a number to this RFC.

text/0000-opentelemetry-protocol.md Outdated Show resolved Hide resolved
text/0000-opentelemetry-protocol.md Outdated Show resolved Hide resolved
@tigrannajaryan
Copy link
Member Author

Admins, can you please assign an RFC number to this? Also, based on the RFC process shouldn't this be merged in proposed status and then we continue the discussion?

@tigrannajaryan tigrannajaryan force-pushed the feature/tigran/addunary branch 2 times, most recently from bc936e0 to 824f9a4 Compare September 17, 2019 19:08
@tigrannajaryan
Copy link
Member Author

Reviewers, please have another look. I changed wording around Unary to remove the restriction that only Unary request may be in flight.

@tigrannajaryan tigrannajaryan force-pushed the feature/tigran/addunary branch 2 times, most recently from cf36971 to 9fe913b Compare September 25, 2019 22:59
@tigrannajaryan
Copy link
Member Author

UPDATE:

Further experiments have shown that Streaming implementation
results in significantly more complex code. The performance
benefits of Streaming mode were only visible in very
small batches and are not applicable to real life scenarios
when under high load we aim to have large batches.

In addition, one of the potential benefits that we could
get with Streaming - the ability to optimize and avoid
repetition of Resource does not seem to have much real
life usage.

As a result Streaming mode was removed from the RFC and
Unary mode is the only mode that is now part of the spec.

Please have another look.

@tedsuo tedsuo added this to the Alpha v0.3 milestone Oct 8, 2019
tigrannajaryan pushed a commit to tigrannajaryan/rfcs that referenced this pull request Oct 11, 2019
This is a continuation of OTLP RFC proposal open-telemetry#35

This change defines the data format used by Span and Resource messages.

The data format is a result of research of prior art (primarily OpenCensus and Jaeger),
as well as experimentation and benchmarking done as part of OTLP RFC proposal.

Go benchmark source code is available at https://github.com/tigrannajaryan/exp-otelproto
(use `make benchmark-encoding` target). Benchmarking shows that depending on the payload
composition this data format is about 4x-5x faster in encoding and 2x-3x faster in
decoding equivalent data compared to OpenCensus data format (all benchmarks in Go).

Notable differences from OpenCensus:

- Attribute key/value pairs are represented as a list rather than as a map.
  This results in significant performance gains and at the same time changes
  the semantic of attributes because now it is possible to have multiple attributes
  with the same key. This is also in-line with Jaeger's tags representation.

- Removed unnecessary wrappers such as google.protobuf.Timestamp which resulted in
  significant performance improvements for certain payload compositions (e.g. lots of
  TimedEvents).

- Resource labels use the same data type as Span attributes which now allows
  to have labels with other data types (OpenCensus only allowed strings).
tigrannajaryan pushed a commit to tigrannajaryan/rfcs that referenced this pull request Oct 11, 2019
This is a continuation of OTLP RFC proposal open-telemetry#35

This change defines the data format used by Span and Resource messages.

The data format is a result of research of prior art (primarily OpenCensus and Jaeger),
as well as experimentation and benchmarking done as part of OTLP RFC proposal.

Go benchmark source code is available at https://github.com/tigrannajaryan/exp-otelproto
(use `make benchmark-encoding` target). Benchmarking shows that depending on the payload
composition this data format is about 4x-5x faster in encoding and 2x-3x faster in
decoding equivalent data compared to OpenCensus data format (all benchmarks in Go).

Notable differences from OpenCensus:

- Attribute key/value pairs are represented as a list rather than as a map.
  This results in significant performance gains and at the same time changes
  the semantic of attributes because now it is possible to have multiple attributes
  with the same key. This is also in-line with Jaeger's tags representation.

- Removed unnecessary wrappers such as google.protobuf.Timestamp which resulted in
  significant performance improvements for certain payload compositions (e.g. lots of
  TimedEvents).

- Resource labels use the same data type as Span attributes which now allows
  to have labels with other data types (OpenCensus only allowed strings).
tigrannajaryan pushed a commit to tigrannajaryan/rfcs that referenced this pull request Oct 11, 2019
This is a continuation of OTLP RFC proposal open-telemetry#35

This change defines the data format used by Span and Resource messages.

The data format is a result of research of prior art (primarily OpenCensus and Jaeger),
as well as experimentation and benchmarking done as part of OTLP RFC proposal.

Go benchmark source code is available at https://github.com/tigrannajaryan/exp-otelproto
(use `make benchmark-encoding` target). Benchmarking shows that depending on the payload
composition this data format is about 4x-5x faster in encoding and 2x-3x faster in
decoding equivalent data compared to OpenCensus data format (all benchmarks in Go).

Notable differences from OpenCensus:

- Attribute key/value pairs are represented as a list rather than as a map.
  This results in significant performance gains and at the same time changes
  the semantic of attributes because now it is possible to have multiple attributes
  with the same key. This is also in-line with Jaeger's tags representation.

- Removed unnecessary wrappers such as google.protobuf.Timestamp which resulted in
  significant performance improvements for certain payload compositions (e.g. lots of
  TimedEvents).

- Resource labels use the same data type as Span attributes which now allows
  to have labels with other data types (OpenCensus only allowed strings).
@tigrannajaryan
Copy link
Member Author

Reviewers, all questions and comments on this PR have been answered/address. Please have another look. We need to move forward with this, the protocol planned to be part of Alpha 0.3 milestone and the earlier we have an approved spec the better chances to have an implementation ready by 0.3 as well.

@akehlenbeck
Copy link

I'm not an official reviewer, but I'm glad to see that the concurrent-unary-model was sufficient and that the streaming model has been removed. LGTM. Thanks for the prototyping you did around that issue Tigran.

text/0035-opentelemetry-protocol.md Show resolved Hide resolved
text/0035-opentelemetry-protocol.md Show resolved Hide resolved
text/0035-opentelemetry-protocol.md Show resolved Hide resolved
Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left few comments, nothing critical

tigrannajaryan and others added 4 commits October 18, 2019 14:15
OpenTelemetry Protocol (OTLP) specification describes the encoding, transport and delivery mechanism of telemetry data between telemetry sources, intermediate nodes such as collectors and telemetry backends.
Experiments have shown that Streaming implementation
results in significantly more complex code. The performance
benefits of Streaming mode were only visible in very
small batches and are not applicable to real life scenarios
when under high load we aim to have large batches.

In addition, one of the potential benefits that we could
get with Streaming - the ability to optimize and avoid
repetition of Resource does not seem to have much real
life usage.

As a result Streaming mode was removed from the RFC and
Unary mode is the only mode that is now part of the spec.
@tigrannajaryan
Copy link
Member Author

Please have one more look.

I believe the only remaining open question is whether to include Author and Acknowledgements sections. I am happy to go either way (my preference would be to keep Acknowledgements section since this document is a result of several people's work, not only my own, which won't be visible if we only look at the PR or commit history).

@tigrannajaryan
Copy link
Member Author

I changed the status to approved, so that this gets merged in approved status (when it gets merged). If I understand correctly this is the right flow for RFCs.

@tigrannajaryan
Copy link
Member Author

Admins, I think we have exhausted questions on this PR and have a principal agreement about the primary content - the protocol itself.

The only outstanding issue is a meta regarding including or not including an Author/Acknowledgements section that lists people who contributed to the RFC. Can you please make a call on this? I believe such a decision will be applicable to other RFCs (and I am happy with either approach).

text/0035-opentelemetry-protocol.md Outdated Show resolved Hide resolved
text/0035-opentelemetry-protocol.md Outdated Show resolved Hide resolved
@tigrannajaryan
Copy link
Member Author

Admins, I need one more approval or more comments on this.

@yurishkuro yurishkuro merged commit fc5c860 into open-telemetry:master Oct 29, 2019
@tigrannajaryan tigrannajaryan deleted the feature/tigran/addunary branch October 29, 2019 20:45
@tigrannajaryan
Copy link
Member Author

@yurishkuro thank you.

tigrannajaryan pushed a commit to tigrannajaryan/rfcs that referenced this pull request Oct 29, 2019
This is a continuation of OTLP RFC proposal open-telemetry#35

This change defines the data format used by Span and Resource messages.

The data format is a result of research of prior art (primarily OpenCensus and Jaeger),
as well as experimentation and benchmarking done as part of OTLP RFC proposal.

Go benchmark source code is available at https://github.com/tigrannajaryan/exp-otelproto
(use `make benchmark-encoding` target). Benchmarking shows that depending on the payload
composition this data format is about 4x-5x faster in encoding and 2x-3x faster in
decoding equivalent data compared to OpenCensus data format (all benchmarks in Go).

Notable differences from OpenCensus:

- Attribute key/value pairs are represented as a list rather than as a map.
  This results in significant performance gains and at the same time changes
  the semantic of attributes because now it is possible to have multiple attributes
  with the same key. This is also in-line with Jaeger's tags representation.

- Removed unnecessary wrappers such as google.protobuf.Timestamp which resulted in
  significant performance improvements for certain payload compositions (e.g. lots of
  TimedEvents).

- Resource labels use the same data type as Span attributes which now allows
  to have labels with other data types (OpenCensus only allowed strings).
bogdandrutu pushed a commit that referenced this pull request Nov 7, 2019
* Add OTLP Trace Data Format specification

This is a continuation of OTLP RFC proposal #35

This change defines the data format used by Span and Resource messages.

The data format is a result of research of prior art (primarily OpenCensus and Jaeger),
as well as experimentation and benchmarking done as part of OTLP RFC proposal.

Go benchmark source code is available at https://github.com/tigrannajaryan/exp-otelproto
(use `make benchmark-encoding` target). Benchmarking shows that depending on the payload
composition this data format is about 4x-5x faster in encoding and 2x-3x faster in
decoding equivalent data compared to OpenCensus data format (all benchmarks in Go).

Notable differences from OpenCensus:

- Attribute key/value pairs are represented as a list rather than as a map.
  This results in significant performance gains and at the same time changes
  the semantic of attributes because now it is possible to have multiple attributes
  with the same key. This is also in-line with Jaeger's tags representation.

- Removed unnecessary wrappers such as google.protobuf.Timestamp which resulted in
  significant performance improvements for certain payload compositions (e.g. lots of
  TimedEvents).

- Resource labels use the same data type as Span attributes which now allows
  to have labels with other data types (OpenCensus only allowed strings).

* Address review comments

* Add protobuf type qualifier to pre-formatted blocks

* Add a note about future goals for the protocol

* Address review comments

* Make sure all field comments start with field name

* Change status to approved

* Clarify start and end time expectations.

* Address Sergey's comments

* Remove string length requirement

* Add StatusCode enum
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants