Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add semantic conventions for HTTP metrics #739

Merged
merged 22 commits into from
Aug 30, 2020

Conversation

gfuller1
Copy link
Contributor

@gfuller1 gfuller1 commented Jul 24, 2020

This spec is written to describe WHAT http metric events should look like and not HOW they are generated. This spec DOES NOT describe the pipeline of how a metric event is created. This is to prevent confusion around coupling metrics and spans together in cases where there might not exist a span. By focusing on the expected outcome of metric generation and not the details of HOW they are generated, we can keep the spec simple and concise.

This spec PR is written to be a foundation for metric semantics and not the end all of this document. Thoughts on how this PR should change belong in comments below. If you have thoughts on what to add to this document I will kindly suggest that they go onto a follow up PR.

Context: This PR adds a spec to describe what HTTP metric events should look like when they are created. This will provide consistency and guidance for anyone looking for how to report HTTP metric data.

Thoughts:

  • All of the data needed to create the HTTP metric events can be found on an HTTP span.
  • Although all the data can be found on HTTP spans, the way the spec is written avoids coupling the two specs so people aren't confused if there isn't a span for them to get the data from.
  • The loose coupling leads to a bit of duplication in the HTTP metric and span spec, which could get out of sync down the road, but it should lead to less confusion for people interpreting the spec.

I am open to any and all feedback!

Related issues #738

@gfuller1 gfuller1 requested review from a team July 24, 2020 23:46
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jul 24, 2020

CLA Check
The committers are authorized under a signed CLA.

@gfuller1 gfuller1 changed the title Http metrics spec Add semantic conventions for Http metrics Jul 27, 2020
@gfuller1 gfuller1 changed the title Add semantic conventions for Http metrics Add semantic conventions for HTTP metrics Jul 27, 2020
@ebullient
Copy link

Micrometer has a summary notion of outcome, which is a grouping/categorization of status codes. Just a snip of code because I'm lazy:

        if (status >= 100 && status < 200) {
            return INFORMATIONAL;
        } else if (status >= 200 && status < 300) {
            return SUCCESS;
        } else if (status >= 300 && status < 400) {
            return REDIRECTION;
        } else if (status >= 400 && status < 500) {
            return CLIENT_ERROR;
        } else if (status >= 500 && status < 600) {
            return SERVER_ERROR;
        }

This is really nice from a query perspective (to get all redirects, rather than 302 or 304, or pick a 400 response).

Should something like this be included?

@gfuller1
Copy link
Contributor Author

gfuller1 commented Aug 6, 2020

Micrometer has a summary notion of outcome, which is a grouping/categorization of status codes. Just a snip of code because I'm lazy:

        if (status >= 100 && status < 200) {
            return INFORMATIONAL;
        } else if (status >= 200 && status < 300) {
            return SUCCESS;
        } else if (status >= 300 && status < 400) {
            return REDIRECTION;
        } else if (status >= 400 && status < 500) {
            return CLIENT_ERROR;
        } else if (status >= 500 && status < 600) {
            return SERVER_ERROR;
        }

This is really nice from a query perspective (to get all redirects, rather than 302 or 304, or pick a 400 response).

Should something like this be included?

This sounds very helpful and I think could stir up some good conversation. In order to prevent from scope creep on this PR I'd prefer to keep it as simple and small as possible, so I think this would be a great follow up PR for this spec!

@jmacd
Copy link
Contributor

jmacd commented Aug 20, 2020

@tigrannajaryan and @bogdandrutu
We discussed this PR at length in the Metrics SIG call today. We feel that this should be merged as it is blocking progress.

We reached agreement that there is no intentional "subtle differences from equally named span attributes" here; the semantics should not change when an attribute is used on a span or on a metric. @grahamfuller1 will resolve any overt differences that are introduced here, aiming to get this merged quickly.

We reached agreement that there is no conceptual problem with converting a numeric value to a string value. I challenge the notion that the semantics of 404 and "404" are at all different; the semantics do not change when the data representation changes from integer to string.

The common difference between attributes on spans and labels on metrics is that we generally wish to avoid high-cardinality such as results from the use of request-size and response-size as span attributes. To address this, we recommend general guidance on avoiding labels with high cardinality, with examples given for the ones we know about, so that we don't have to update the specification for every new potentially high-cardinality HTTP semantic convention.

@justinfoote will follow this PR (which again, we hope to merge quickly) with a proposal that we refactor the OTel specification so that all semantic conventions for attributes and labels move into a general location, independent of spans/metrics/resources. Then where necessary the span and metrics specifications can refer to the general-purpose specification, but I think that should not be necessary very often.

@jmacd
Copy link
Contributor

jmacd commented Aug 20, 2020

@ebullient re: #739 (comment)

Yes, I've also seen HTTP Status codes reduced to "2xx", "3xx", "4xx", and "5xx", but that is something that could be done in a processor and could probably be considered as a separate specification.

@tigrannajaryan
Copy link
Member

@tigrannajaryan and @bogdandrutu
We discussed this PR at length in the Metrics SIG call today. We feel that this should be merged as it is blocking progress.

We reached agreement that there is no intentional "subtle differences from equally named span attributes" here; the semantics should not change when an attribute is used on a span or on a metric. @grahamfuller1 will resolve any overt differences that are introduced here, aiming to get this merged quickly.

We reached agreement that there is no conceptual problem with converting a numeric value to a string value. I challenge the notion that the semantics of 404 and "404" are at all different; the semantics do not change when the data representation changes from integer to string.

I agree. I came to the same realization here: #815 (comment)

We can explicitly legalize that using string representation in metric labels is valid and does not imply any semantic differences.

@justinfoote will follow this PR (which again, we hope to merge quickly) with a proposal that we refactor the OTel specification so that all semantic conventions for attributes and labels move into a general location, independent of spans/metrics/resources. Then where necessary the span and metrics specifications can refer to the general-purpose specification, but I think that should not be necessary very often.

Sounds good.

specification/metrics/semantic_conventions/http-metrics.md Outdated Show resolved Hide resolved
| `http.host` | `client` & `server` | see [label alternatives](#label-alternatives) | The value of the [HTTP host header][]. When the header is empty or not present, this label should be the same. |
| `http.scheme` | `client` & `server` | see [label alternatives](#label-alternatives) | The URI scheme identifying the used protocol: `"http"` or `"https"` |
| `http.status_code` | `client` & `server` | Optional | [HTTP response status code][]. E.g. `200` (integer) |
| `http.status_text` | `client` & `server` | Optional | [HTTP reason phrase][]. E.g. `"OK"` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://tools.ietf.org/html/rfc7231#section-6.1 says that

reason phrases listed here are only recommendations -- they can be replaced by local equivalents

Not that I have seen this actually happening but it is a possibility. If source start sending different reason phrase for the same status code that would arguably make the status text less useful since aggregations would be difficult/impossible to do.

Since we already have http.status_code perhaps just drop this label?

| `http.scheme` | `client` & `server` | see [label alternatives](#label-alternatives) | The URI scheme identifying the used protocol: `"http"` or `"https"` |
| `http.status_code` | `client` & `server` | Optional | [HTTP response status code][]. E.g. `200` (String) |
| `http.status_text` | `client` & `server` | Optional | [HTTP reason phrase][]. E.g. `"OK"` |
| `http.flavor` | `client` & `server` | Optional | Kind of HTTP protocol used: `"1.0"`, `"1.1"`, `"2"`, `"SPDY"` or `"QUIC"`. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "flavor" the right term or "version"?

Also, I am not sure "SPDY" or "QUIC" belong to the same set.

@tigrannajaryan tigrannajaryan dismissed their stale review August 20, 2020 19:23

Issue with status_code is resolved.

Copy link
Member

@tigrannajaryan tigrannajaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, especially given that there is an intent to follow up with an improving PR.

@gfuller1
Copy link
Contributor Author

@tigrannajaryan since http.status_code changing seems like a very rare occurrence I'll address that in the follow up PR. flavor is what's used for the span event attribute. If we want to change the name of the attribute and this label that sounds like a great candidate for a small separate PR as to not add more conversation to this already long PR.

@gfuller1
Copy link
Contributor Author

@bogdandrutu see #739 (comment). Would love to hear back on this today.

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to merge this as long as we create an issue to followup for the default list of labels that will be used, and how this impacts the cardinality of the metric that we produce for http.

@gfuller1
Copy link
Contributor Author

Here is the issue with the followup work once the common attribute/label list is created #897

@bogdandrutu bogdandrutu merged commit a6db328 into open-telemetry:master Aug 30, 2020
@gfuller1 gfuller1 deleted the http-metrics-spec branch August 31, 2020 01:30
@lubingfeng
Copy link

@bogdandrutu @jmacd Just want to check if we have finalized stats specifications. If not, when.

We are trying to complete statsreceiver (https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/master/receiver/statsdreceiver) - by middle September.

@lubingfeng
Copy link

@bogdandrutu @jmacd Just want to check if we have finalized stats specifications. If not, when.

We are trying to complete statsreceiver (https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/master/receiver/statsdreceiver) - by middle September.

@bogdandrutu @jmacd any comments on this?

@jmacd
Copy link
Contributor

jmacd commented Sep 9, 2020

@lubingfeng we have released OTLP v0.5 that has support for delta aggregation temporality last week, and support is expected in this week's Collector release. Sorry for the delays! Let me know how else I can assist.

@lubingfeng
Copy link

Thank you for the info, @jmacd. Good to hear OTLP v0.5 had delta aggregation included and support in this week's Collector release. Do we know when OTLP for metrics can be finalized? Our implementation of statsd receiver
(https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/master/receiver/statsdreceiver) is based on open census based on Nick's discussion with you. We want to get it based on OTLP by the end of the month. Just want to check if its feasibility.

@jmacd
Copy link
Contributor

jmacd commented Sep 9, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions spec:metrics Related to the specification/metrics directory
Projects
None yet
Development

Successfully merging this pull request may close these issues.