Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC 0003, 0007 updates (Metrics measure type, metrics handle specification), RFC 0004 (configurable aggregation) deleted #29

Merged
merged 26 commits into from
Sep 12, 2019

Conversation

jmacd
Copy link
Contributor

@jmacd jmacd commented Aug 27, 2019

Updates to 0003 following work session 8/21/2019

Some content has been moved to a new RFC on metric handles.

This captures most of the discussion from the 8/21 meeting and mostly eliminates the need for RFC 0004.

text/0003-measure-metric-type.md Outdated Show resolved Hide resolved
text/0003-measure-metric-type.md Outdated Show resolved Hide resolved
@lzchen
Copy link
Contributor

lzchen commented Aug 27, 2019

In summary, these are the agreements that this RFC addresses:

  1. Agreement on context association across the metrics surface.
  2. Agreement: Two diff interfaces - 1) For defining local labels 2) Metric operations
    getHandle(labels) -> return handle object
    handle.set() or handle.add()
  3. Agreement: Call site variables can be supported by context tags
  4. Agreement: All defined labels are captured in handles
    Unspecified values will be accepted but will not be treated as empty
  5. Agreement: Context tags do not override handles tags
    Do not list remote tags for pre-aggregation (when you declare the metric)
  6. Agreement: No support delta metric type. All type should support increments and decrements (AI: come up with names for a-c)
    Set can have the option to restrict values to increase
    Add can have the option to restrict to +ive values
    Record can have the option to be restricted to only +ive values
  7. Agreement: Metric type will have default aggregation and will have the option of enabling them.
  8. Agreement: Support Context-less Set (aka callbacks) for observing current value metrics (snapshots) (bullet 6,7 applies to 8)

Agreements that still need to be addressed:

  1. Agreement: Supporting order based and non order based getHandle()
    Order based handles can throw exceptions: as determined by language
    Non-order based handles can fill-in unspecified-value automatically (point 4 in addressed)
  2. Agreement: Provide transformer for external metrics provider like drop-wizzard (rather than callback observer). (details on this?)
  3. Agreement: Resources are associated with SDK Manager, Component is associated with Tracer/Meter, metric name is unique per component (in prometheus exporter we would add component name as prefix).
  4. Vector metrics? (prometheus)

@jmacd Feel free to edit if I'm missing anything!

@lzchen
Copy link
Contributor

lzchen commented Aug 27, 2019

Looks good! I think we can address the other agreements maybe as edits to this RFC in the future once they are decided or simply push them up into the spec.

@tedsuo tedsuo self-requested a review September 3, 2019 03:39
Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good


`GetHandle` specifies two calling patterns that may be supported: one with ordered label values and one without. The facility for ordered label values is provided as a potential optimization that facilitates a simple lookup for the SDK; in this form, the API is permitted to thrown an exception or return an error when there is a mismatch in the arguments to `GetHandle`. When label values are accepted in any order, some SDKs may perform an expensive lookup to find an existing metrics handle, but they must not throw exceptions.

`GetHandle` and `GetDefaultHandle` support additional label values not required in the definition of the metric instrument. These optional labels act the same as pre-defined labels in the low-level metrics data representation, only that they are not required. Some SDKs may elect to use additional label values as the "attachment" data on metrics.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand "additional label values", do you mean these labels may be extracted from the "Context"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed two cases:
(1) how a user sometimes wants to add additional labels at the individual call site, that we might treat these using context propagation instead
(2) how a user sometimes wants to include additional labels in the handle, as a matter of shortening the program code for metrics that are used more than once in code. I don't think we pinned this down. We agreed that what I'm now calling "required keys", which are declared in the metric, that these are always defined in a handle (including potentially unspecified). Here, "Additional Labels" are associated with the handle but cannot be used for pre-aggregation because they are not required. For a metrics data exporter, they can be ignored. For a tracing system or diagnostic-event exporter, these would be additional data.

I've posted an open question at the bottom. Can we use additional labels to capture the Attachment concept?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we start without the support of these extra labels for the moment? Adding a new API is always possible and backwards compatible. Trying to start a bit small :)

text/0003-measure-metric-type.md Show resolved Hide resolved

Likely to be the most common kind of metric, cumulative metric events express the computation of a sum. Choose this kind of metric when the value is a quantity, the sum is of primary interest, and the event count and distribution are not of primary interest. To raise (or lower) a cumulative metric, call the `Add()` method.

If the quantity in question is always non-negative, it implies that the sum is strictly ascending. When this is the case, the cumulative metric also serves to define a rate. For this reason, cumulative metrics have a `NonNegative` option to be declared as non-negative. The API will reject negative updates to non-negative cumulative metrics, instead submitting an SDK error event, which helps ensure meaningful rate calculations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krnowak a rate can be calculated for any ascending sum (not necessary strictly ascending). @jmacd please fix :)


Like cumulative metrics, non-negative measures are an important case because they support rate calculations. As an option, measure metrics may be declared as `NonNegative`. The API will reject negative metric events for non-negative measures, instead submitting an SDK error event.

Because measure metrics have such wide application, implementations are likely to provide configurable behavior. OpenTelemetry may provide such a facility in its standard SDK, but in case no configuration is provided by the application, a low-cost policy is specified as the default behavior, whic is to export the sum, the count (rate), the minimum value, and the maximum value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure the min/max are relevant. can you please point me to how this can be useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are P0 and P100 of the distribution, and they're the only two quantiles that can be computed in O(1) space. I realize there are problems in aggregating these, but I've used this sort of summary to get an idea of the worst-case latency in a high-throughput section of code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way how cloudwatch makes use of this is by computing what we called "interval stats". They require you to have a period and always send the metric that represents the statistic for this period of time (kind of already rated) so that's how they can do some aggregation :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I can see this as one of the aggregation "Statistic over time", then min/max are useful.


Because measure metrics have such wide application, implementations are likely to provide configurable behavior. OpenTelemetry may provide such a facility in its standard SDK, but in case no configuration is provided by the application, a low-cost policy is specified as the default behavior, whic is to export the sum, the count (rate), the minimum value, and the maximum value.

### Disable selected metrics by default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this. Do we offer an "enable" option when a metric is created? If so, please mention that clearly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote this section to say that enabled is the default and Disabled is an option.


Applications sometimes want to act upon multiple metric handles in a single API call, either because the values are inter-related to each other, or because it lowers overhead. We agree that recording batch measurements will be restricted to measure metrics, although this support could be extended to all kinds of metric in the future.

A single measurement is defined as:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand how this will be implemented:
Does the library developer need to provide Handles OR Metrics + a list of label values that will be applied to get the Handles (if an implementation decide to do that). Some other implementations may decide to record this event as it is so GetHandle can be avoided.

The second option is what we have in OpenCensus and works well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that I can see both cases being useful. I recognize that "raw statistics" in OC had a 3-tuple of (metric, value, pre-defined labels) as the measurement. I wanted to be able to use a 2-tuple for RecordBatch because I thought the purpose was to provide atomicity.

We discussed whether it was possible to have per-call-site label values in the metrics API, and you requested treating those as labels derived from the context. This was agreed--and now a related point of confusion has come up about additional label values in the handle, which I favor.

This topic appears to be whether we can avoid the cost of allocating handles through the SDK for cases that are batch measurements that will only be used once. I see the benefit, but it seems to prevent recording a batch of metrics using a handle.

I could see supporting both options. That's somewhat unsatisfying.

For the gRPC configuration prescribed in OC, I take it that Handles are used for the cumulative and gauge metrics, with pre-defined labels, but the raw measurements did not use pre-defined labels. But gRPC knows the pre-defined labels here. I presume this use of pre-defined labels for cumulative and gauges is because these are pre-aggregated. Therefore, measure metrics do not benefit, the argument goes, from pre-defined labels as much.

Can we avoid this by:
(1) requiring labels supplied in a RecordBatch to be passed via context
(2) allowing RecordBatch to take either a metric instrument or a handle
(3) if a handle, the additional labels are passed by context; if a metric, both required and additional labels are passed by context.

At some point, thus blurs the distinction between call-site and per-handle labels...


The argument against this is that metric instruments are meant to be pure API objects, they are not constructed through an SDK. Therefore, the default Meter (SDK) will have to be located from the context, meaning there is a question about whether this is as efficient as storing a re-usable handle for the default case. For metric instruments with no required keys, this will be a real question: what is the benefit of a handle of it specifies no information other than the SDK?

If we eliminate `GetDefaultHandle()`, the SDK may keep a map of metric instrument to default handle on its own.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DefaultHandle in my understanding is a GetHandle(List /null for all the values). So an argument can be made to remove this. Also for metrics without predefined labels will need to call GetHandle(empty_list) which may require an unnecessary allocation of the empty_list object.

I want mostly to make sure that by DefaultHandle we understand the same thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's my understanding.


The same discussion can be had for the `MeasurementBatch` type described here. It can be declared with an ordered list of metrics, then the `Record` API takes only an ordered list of numbers. Alternatively, and less prone to misuse, the `MeasurementBatch.Record` API could be declared with a list of metric:number pairs.
- The `Record` in `RecordBatch` suggests it is to be applied to measure metrics. This is due to measure metrics being the most general-purpose of metric instruments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the measure metric is the most expensive to calculate so an implementation may defer that aggregation (especially quantiles require locking), in the sum case that is just an atomic fetch_add.

@songy23
Copy link
Member

songy23 commented Sep 4, 2019

https://circleci.com/gh/open-telemetry/oteps/115?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

./text/0003-measure-metric-type.md:5:2: "Foreward" is a misspelling of "Foreword"
./text/0003-measure-metric-type.md:113:294: "whic" is a misspelling of "which"

Please fix. (make misspell-correction should fix the typos.)

text/0000-metric-handles.md Outdated Show resolved Hide resolved

## Motivation

The specification currently names this concept `TimeSeries`, the object returned by `GetOrCreateTimeseries`, which supports binding a metric to a pre-defined set of labels for repeated use. This proposal renames these `Handle` and `GetHandle`, respectively, and adds further detail to the API specification for handles.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having gauge.GetHandle(...) seems to be okay.
Having a class/type named Handle could be confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Go prototype, there is indeed a Handle type but it's for internal use. The types returned by the GetHandle() methods in Go PR#100 are like Float64GaugeHandle, Int64CumulativeHandle. Is this sufficiently clear?

@jmacd
Copy link
Contributor Author

jmacd commented Sep 11, 2019

I will revise this RFC again tomorrow with some of the outcome of today's metrics meeting. For now, I've allocated number 0007-metric-handle, making it precede 0008-metric-observer in PR#39.

@jmacd jmacd changed the title Metrics measure type [update] and metrics handle specification WIP: RFC 0003, 0007 updates (Metrics measure type, metrics handle specification), RFC 0004 (configurable aggregation) deleted Sep 11, 2019
@jmacd
Copy link
Contributor Author

jmacd commented Sep 11, 2019

Update: It's been suggested that this PR is too long to be useful, and I would like to propose merging it in the current "proposed" state. I will immediately open another PR for further comments as I begin another round of updates regarding the LabelSet discussion yesterday.

@bogdandrutu

@songy23 songy23 changed the title WIP: RFC 0003, 0007 updates (Metrics measure type, metrics handle specification), RFC 0004 (configurable aggregation) deleted RFC 0003, 0007 updates (Metrics measure type, metrics handle specification), RFC 0004 (configurable aggregation) deleted Sep 11, 2019
@yurishkuro yurishkuro merged commit 3e9d81d into open-telemetry:master Sep 12, 2019
@jmacd jmacd deleted the jmacd/update_0003 branch September 13, 2019 18:15
Oberon00 pushed a commit to dynatrace-oss-contrib/oteps that referenced this pull request Sep 16, 2019
* span operations API

* fixed TBD on span

* fixed a link

* addressed Bogdan's feedback

* fixed optionality of attributes

* Update tracing-api.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.