RFC 0003, 0007 updates (Metrics measure type, metrics handle specification), RFC 0004 (configurable aggregation) deleted #29

jmacd · 2019-08-27T08:31:52Z

Updates to 0003 following work session 8/21/2019

Some content has been moved to a new RFC on metric handles.

This captures most of the discussion from the 8/21 meeting and mostly eliminates the need for RFC 0004.

text/0003-measure-metric-type.md

lzchen · 2019-08-27T22:24:19Z

In summary, these are the agreements that this RFC addresses:

Agreement on context association across the metrics surface.
Agreement: Two diff interfaces - 1) For defining local labels 2) Metric operations
getHandle(labels) -> return handle object
handle.set() or handle.add()
Agreement: Call site variables can be supported by context tags
Agreement: All defined labels are captured in handles
Unspecified values will be accepted but will not be treated as empty
Agreement: Context tags do not override handles tags
Do not list remote tags for pre-aggregation (when you declare the metric)
Agreement: No support delta metric type. All type should support increments and decrements (AI: come up with names for a-c)
Set can have the option to restrict values to increase
Add can have the option to restrict to +ive values
Record can have the option to be restricted to only +ive values
Agreement: Metric type will have default aggregation and will have the option of enabling them.
Agreement: Support Context-less Set (aka callbacks) for observing current value metrics (snapshots) (bullet 6,7 applies to 8)

Agreements that still need to be addressed:

Agreement: Supporting order based and non order based getHandle()
Order based handles can throw exceptions: as determined by language
Non-order based handles can fill-in unspecified-value automatically (point 4 in addressed)
Agreement: Provide transformer for external metrics provider like drop-wizzard (rather than callback observer). (details on this?)
Agreement: Resources are associated with SDK Manager, Component is associated with Tracer/Meter, metric name is unique per component (in prometheus exporter we would add component name as prefix).
Vector metrics? (prometheus)

@jmacd Feel free to edit if I'm missing anything!

lzchen · 2019-08-27T22:27:25Z

Looks good! I think we can address the other agreements maybe as edits to this RFC in the future once they are decided or simply push them up into the spec.

text/0003-measure-metric-type.md

bogdandrutu

Overall looks good

bogdandrutu · 2019-09-04T20:54:37Z

text/0000-metric-handles.md

+
+`GetHandle` specifies two calling patterns that may be supported: one with ordered label values and one without.  The facility for ordered label values is provided as a potential optimization that facilitates a simple lookup for the SDK; in this form, the API is permitted to thrown an exception or return an error when there is a mismatch in the arguments to `GetHandle`.  When label values are accepted in any order, some SDKs may perform an expensive lookup to find an existing metrics handle, but they must not throw exceptions.
+
+`GetHandle` and `GetDefaultHandle` support additional label values not required in the definition of the metric instrument.  These optional labels act the same as pre-defined labels in the low-level metrics data representation, only that they are not required.  Some SDKs may elect to use additional label values as the "attachment" data on metrics.


Not sure I understand "additional label values", do you mean these labels may be extracted from the "Context"?

We discussed two cases:
(1) how a user sometimes wants to add additional labels at the individual call site, that we might treat these using context propagation instead
(2) how a user sometimes wants to include additional labels in the handle, as a matter of shortening the program code for metrics that are used more than once in code. I don't think we pinned this down. We agreed that what I'm now calling "required keys", which are declared in the metric, that these are always defined in a handle (including potentially unspecified). Here, "Additional Labels" are associated with the handle but cannot be used for pre-aggregation because they are not required. For a metrics data exporter, they can be ignored. For a tracing system or diagnostic-event exporter, these would be additional data.

I've posted an open question at the bottom. Can we use additional labels to capture the Attachment concept?

Can we start without the support of these extra labels for the moment? Adding a new API is always possible and backwards compatible. Trying to start a bit small :)

text/0003-measure-metric-type.md

bogdandrutu · 2019-09-04T21:10:38Z

text/0003-measure-metric-type.md

+
+Likely to be the most common kind of metric, cumulative metric events express the computation of a sum.  Choose this kind of metric when the value is a quantity, the sum is of primary interest, and the event count and distribution are not of primary interest.  To raise (or lower) a cumulative metric, call the `Add()` method.
+
+If the quantity in question is always non-negative, it implies that the sum is strictly ascending.  When this is the case, the cumulative metric also serves to define a rate.  For this reason, cumulative metrics have a `NonNegative` option to be declared as non-negative.  The API will reject negative updates to non-negative cumulative metrics, instead submitting an SDK error event, which helps ensure meaningful rate calculations.


@krnowak a rate can be calculated for any ascending sum (not necessary strictly ascending). @jmacd please fix :)

bogdandrutu · 2019-09-04T21:13:07Z

text/0003-measure-metric-type.md

+
+Like cumulative metrics, non-negative measures are an important case because they support rate calculations. As an option, measure metrics may be declared as `NonNegative`.  The API will reject negative metric events for non-negative measures, instead submitting an SDK error event.
+
+Because measure metrics have such wide application, implementations are likely to provide configurable behavior.  OpenTelemetry may provide such a facility in its standard SDK, but in case no configuration is provided by the application, a low-cost policy is specified as the default behavior, whic is to export the sum, the count (rate), the minimum value, and the maximum value.


not sure the min/max are relevant. can you please point me to how this can be useful?

These are P0 and P100 of the distribution, and they're the only two quantiles that can be computed in O(1) space. I realize there are problems in aggregating these, but I've used this sort of summary to get an idea of the worst-case latency in a high-throughput section of code.

It's also part of the CloudWatch metrics data model.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_concepts.html#Statistic

The way how cloudwatch makes use of this is by computing what we called "interval stats". They require you to have a period and always send the metric that represents the statistic for this period of time (kind of already rated) so that's how they can do some aggregation :)

So I can see this as one of the aggregation "Statistic over time", then min/max are useful.

bogdandrutu · 2019-09-04T21:14:36Z

text/0003-measure-metric-type.md

+
+Because measure metrics have such wide application, implementations are likely to provide configurable behavior.  OpenTelemetry may provide such a facility in its standard SDK, but in case no configuration is provided by the application, a low-cost policy is specified as the default behavior, whic is to export the sum, the count (rate), the minimum value, and the maximum value.
+
+### Disable selected metrics by default


Not sure I understand this. Do we offer an "enable" option when a metric is created? If so, please mention that clearly.

I rewrote this section to say that enabled is the default and Disabled is an option.

bogdandrutu · 2019-09-04T21:17:33Z

text/0003-measure-metric-type.md

+
+Applications sometimes want to act upon multiple metric handles in a single API call, either because the values are inter-related to each other, or because it lowers overhead.  We agree that recording batch measurements will be restricted to measure metrics, although this support could be extended to all kinds of metric in the future.
+
+A single measurement is defined as:


Not sure I understand how this will be implemented:
Does the library developer need to provide Handles OR Metrics + a list of label values that will be applied to get the Handles (if an implementation decide to do that). Some other implementations may decide to record this event as it is so GetHandle can be avoided.

The second option is what we have in OpenCensus and works well.

The problem is that I can see both cases being useful. I recognize that "raw statistics" in OC had a 3-tuple of (metric, value, pre-defined labels) as the measurement. I wanted to be able to use a 2-tuple for RecordBatch because I thought the purpose was to provide atomicity.

We discussed whether it was possible to have per-call-site label values in the metrics API, and you requested treating those as labels derived from the context. This was agreed--and now a related point of confusion has come up about additional label values in the handle, which I favor.

This topic appears to be whether we can avoid the cost of allocating handles through the SDK for cases that are batch measurements that will only be used once. I see the benefit, but it seems to prevent recording a batch of metrics using a handle.

I could see supporting both options. That's somewhat unsatisfying.

For the gRPC configuration prescribed in OC, I take it that Handles are used for the cumulative and gauge metrics, with pre-defined labels, but the raw measurements did not use pre-defined labels. But gRPC knows the pre-defined labels here. I presume this use of pre-defined labels for cumulative and gauges is because these are pre-aggregated. Therefore, measure metrics do not benefit, the argument goes, from pre-defined labels as much.

Can we avoid this by:
(1) requiring labels supplied in a RecordBatch to be passed via context
(2) allowing RecordBatch to take either a metric instrument or a handle
(3) if a handle, the additional labels are passed by context; if a metric, both required and additional labels are passed by context.

At some point, thus blurs the distinction between call-site and per-handle labels...

bogdandrutu · 2019-09-04T21:23:22Z

text/0003-measure-metric-type.md

+
+The argument against this is that metric instruments are meant to be pure API objects, they are not constructed through an SDK. Therefore, the default Meter (SDK) will have to be located from the context, meaning there is a question about whether this is as efficient as storing a re-usable handle for the default case.  For metric instruments with no required keys, this will be a real question: what is the benefit of a handle of it specifies no information other than the SDK?
+
+If we eliminate `GetDefaultHandle()`, the SDK may keep a map of metric instrument to default handle on its own.


DefaultHandle in my understanding is a GetHandle(List /null for all the values). So an argument can be made to remove this. Also for metrics without predefined labels will need to call GetHandle(empty_list) which may require an unnecessary allocation of the empty_list object.

I want mostly to make sure that by DefaultHandle we understand the same thing.

Yes, that's my understanding.

bogdandrutu · 2019-09-04T21:26:14Z

text/0003-measure-metric-type.md


-The same discussion can be had for the `MeasurementBatch` type described here.  It can be declared with an ordered list of metrics, then the `Record` API takes only an ordered list of numbers.  Alternatively, and less prone to misuse, the `MeasurementBatch.Record` API could be declared with a list of metric:number pairs.
+- The `Record` in `RecordBatch` suggests it is to be applied to measure metrics.  This is due to measure metrics being the most general-purpose of metric instruments.


Also the measure metric is the most expensive to calculate so an implementation may defer that aggregation (especially quantiles require locking), in the sum case that is just an atomic fetch_add.

songy23 · 2019-09-04T22:05:55Z

https://circleci.com/gh/open-telemetry/oteps/115?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

./text/0003-measure-metric-type.md:5:2: "Foreward" is a misspelling of "Foreword"
./text/0003-measure-metric-type.md:113:294: "whic" is a misspelling of "which"

Please fix. (make misspell-correction should fix the typos.)

text/0000-metric-handles.md

reyang · 2019-09-10T16:27:44Z

text/0000-metric-handles.md

+
+## Motivation
+
+The specification currently names this concept `TimeSeries`, the object returned by `GetOrCreateTimeseries`, which supports binding a metric to a pre-defined set of labels for repeated use.  This proposal renames these `Handle` and `GetHandle`, respectively, and adds further detail to the API specification for handles.  


Having gauge.GetHandle(...) seems to be okay.
Having a class/type named Handle could be confusing.

In the Go prototype, there is indeed a Handle type but it's for internal use. The types returned by the GetHandle() methods in Go PR#100 are like Float64GaugeHandle, Int64CumulativeHandle. Is this sufficiently clear?

jmacd · 2019-09-11T05:09:09Z

I will revise this RFC again tomorrow with some of the outcome of today's metrics meeting. For now, I've allocated number 0007-metric-handle, making it precede 0008-metric-observer in PR#39.

jmacd · 2019-09-11T18:22:37Z

Update: It's been suggested that this PR is too long to be useful, and I would like to propose merging it in the current "proposed" state. I will immediately open another PR for further comments as I begin another round of updates regarding the LabelSet discussion yesterday.

@bogdandrutu

* span operations API * fixed TBD on span * fixed a link * addressed Bogdan's feedback * fixed optionality of attributes * Update tracing-api.md

Updates to 0003 following work session 8/21/2019

e401916

jmacd requested review from AloisReitbauer, bogdandrutu, c24t, carlosalberto, iredelmeier, reyang, SergeyKanzhelev, songy23, tedsuo and yurishkuro as code owners August 27, 2019 08:31

Update date

c4a7836

bogdandrutu reviewed Aug 27, 2019

View reviewed changes

text/0003-measure-metric-type.md Outdated Show resolved Hide resolved

text/0003-measure-metric-type.md Outdated Show resolved Hide resolved

bogdandrutu reviewed Aug 27, 2019

View reviewed changes

text/0003-measure-metric-type.md Show resolved Hide resolved

lizthegrey reviewed Aug 27, 2019

View reviewed changes

text/0003-measure-metric-type.md Outdated Show resolved Hide resolved

lzchen reviewed Aug 27, 2019

View reviewed changes

text/0003-measure-metric-type.md Outdated Show resolved Hide resolved

jmacd added 2 commits August 27, 2019 15:02

Feedback applied

beff7cf

Feedback applied

0aa9665

lzchen reviewed Aug 27, 2019

View reviewed changes

text/0003-measure-metric-type.md Show resolved Hide resolved

jmacd mentioned this pull request Aug 30, 2019

Golang metrics prototype open-telemetry/opentelemetry-go#100

Merged

krnowak reviewed Aug 30, 2019

View reviewed changes

text/0003-measure-metric-type.md Outdated Show resolved Hide resolved

Upstream

2c749d8

tedsuo reviewed Sep 1, 2019

View reviewed changes

text/0003-measure-metric-type.md Outdated Show resolved Hide resolved

jmacd added 2 commits September 1, 2019 20:52

Remove handle specification, will create another RFC

7d5bee5

More typing

2e541db

tedsuo self-requested a review September 3, 2019 03:39

tedsuo approved these changes Sep 3, 2019

View reviewed changes

Add metrics handles RFC

8e9fcc3

jmacd added 2 commits September 4, 2019 10:51

Clarify the Measurement unit for RecordBatch

d97868f

Upstream

e997cce

bogdandrutu reviewed Sep 4, 2019

View reviewed changes

jmacd added 4 commits September 4, 2019 14:46

Add issues addressed

54a778f

Linkify

fd1c027

Linkify

deb6fac

Format

a1e526d

jmacd added 3 commits September 4, 2019 16:11

Address option names and default settings

9aee892

Answer questions

7ae4bd8

Spelling

18587f6

lizthegrey approved these changes Sep 10, 2019

View reviewed changes

reyang reviewed Sep 10, 2019

View reviewed changes

text/0000-metric-handles.md Outdated Show resolved Hide resolved

reyang reviewed Sep 10, 2019

View reviewed changes

jmacd added 2 commits September 10, 2019 22:04

Use 0007

c3225a2

Refer to 0008

f1e5972

Take suggestion

ec53885

jmacd changed the title ~~Metrics measure type [update] and metrics handle specification~~ WIP: RFC 0003, 0007 updates (Metrics measure type, metrics handle specification), RFC 0004 (configurable aggregation) deleted Sep 11, 2019

jmacd mentioned this pull request Sep 11, 2019

Acceptance PRs for proposed OTEPs #44

Closed

lzchen approved these changes Sep 11, 2019

View reviewed changes

bogdandrutu approved these changes Sep 11, 2019

View reviewed changes

carlosalberto approved these changes Sep 11, 2019

View reviewed changes

songy23 approved these changes Sep 11, 2019

View reviewed changes

songy23 changed the title ~~WIP: RFC 0003, 0007 updates (Metrics measure type, metrics handle specification), RFC 0004 (configurable aggregation) deleted~~ RFC 0003, 0007 updates (Metrics measure type, metrics handle specification), RFC 0004 (configurable aggregation) deleted Sep 11, 2019

yurishkuro merged commit 3e9d81d into open-telemetry:master Sep 12, 2019

mayurkale22 mentioned this pull request Sep 12, 2019

feat: Metrics API revision based on Specs and RFCs open-telemetry/opentelemetry-js#259

Merged

jmacd deleted the jmacd/update_0003 branch September 13, 2019 18:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC 0003, 0007 updates (Metrics measure type, metrics handle specification), RFC 0004 (configurable aggregation) deleted #29

RFC 0003, 0007 updates (Metrics measure type, metrics handle specification), RFC 0004 (configurable aggregation) deleted #29

jmacd commented Aug 27, 2019 •

edited

Loading

lzchen commented Aug 27, 2019 •

edited

Loading

lzchen commented Aug 27, 2019

bogdandrutu left a comment

bogdandrutu Sep 4, 2019

jmacd Sep 4, 2019

bogdandrutu Sep 6, 2019

bogdandrutu Sep 4, 2019

bogdandrutu Sep 4, 2019

jmacd Sep 4, 2019

jmacd Sep 4, 2019

bogdandrutu Sep 6, 2019

bogdandrutu Sep 6, 2019

bogdandrutu Sep 4, 2019

jmacd Sep 4, 2019

bogdandrutu Sep 4, 2019

jmacd Sep 4, 2019

bogdandrutu Sep 4, 2019

jmacd Sep 4, 2019

bogdandrutu Sep 4, 2019

songy23 commented Sep 4, 2019 •

edited

Loading

reyang Sep 10, 2019

jmacd Sep 11, 2019

jmacd commented Sep 11, 2019

jmacd commented Sep 11, 2019


		`GetHandle` specifies two calling patterns that may be supported: one with ordered label values and one without. The facility for ordered label values is provided as a potential optimization that facilitates a simple lookup for the SDK; in this form, the API is permitted to thrown an exception or return an error when there is a mismatch in the arguments to `GetHandle`. When label values are accepted in any order, some SDKs may perform an expensive lookup to find an existing metrics handle, but they must not throw exceptions.

		`GetHandle` and `GetDefaultHandle` support additional label values not required in the definition of the metric instrument. These optional labels act the same as pre-defined labels in the low-level metrics data representation, only that they are not required. Some SDKs may elect to use additional label values as the "attachment" data on metrics.


		Likely to be the most common kind of metric, cumulative metric events express the computation of a sum. Choose this kind of metric when the value is a quantity, the sum is of primary interest, and the event count and distribution are not of primary interest. To raise (or lower) a cumulative metric, call the `Add()` method.

		If the quantity in question is always non-negative, it implies that the sum is strictly ascending. When this is the case, the cumulative metric also serves to define a rate. For this reason, cumulative metrics have a `NonNegative` option to be declared as non-negative. The API will reject negative updates to non-negative cumulative metrics, instead submitting an SDK error event, which helps ensure meaningful rate calculations.


		Like cumulative metrics, non-negative measures are an important case because they support rate calculations. As an option, measure metrics may be declared as `NonNegative`. The API will reject negative metric events for non-negative measures, instead submitting an SDK error event.

		Because measure metrics have such wide application, implementations are likely to provide configurable behavior. OpenTelemetry may provide such a facility in its standard SDK, but in case no configuration is provided by the application, a low-cost policy is specified as the default behavior, whic is to export the sum, the count (rate), the minimum value, and the maximum value.


		Because measure metrics have such wide application, implementations are likely to provide configurable behavior. OpenTelemetry may provide such a facility in its standard SDK, but in case no configuration is provided by the application, a low-cost policy is specified as the default behavior, whic is to export the sum, the count (rate), the minimum value, and the maximum value.

		### Disable selected metrics by default


		Applications sometimes want to act upon multiple metric handles in a single API call, either because the values are inter-related to each other, or because it lowers overhead. We agree that recording batch measurements will be restricted to measure metrics, although this support could be extended to all kinds of metric in the future.

		A single measurement is defined as:


		The argument against this is that metric instruments are meant to be pure API objects, they are not constructed through an SDK. Therefore, the default Meter (SDK) will have to be located from the context, meaning there is a question about whether this is as efficient as storing a re-usable handle for the default case. For metric instruments with no required keys, this will be a real question: what is the benefit of a handle of it specifies no information other than the SDK?

		If we eliminate `GetDefaultHandle()`, the SDK may keep a map of metric instrument to default handle on its own.


		The same discussion can be had for the `MeasurementBatch` type described here. It can be declared with an ordered list of metrics, then the `Record` API takes only an ordered list of numbers. Alternatively, and less prone to misuse, the `MeasurementBatch.Record` API could be declared with a list of metric:number pairs.
		- The `Record` in `RecordBatch` suggests it is to be applied to measure metrics. This is due to measure metrics being the most general-purpose of metric instruments.


		## Motivation

		The specification currently names this concept `TimeSeries`, the object returned by `GetOrCreateTimeseries`, which supports binding a metric to a pre-defined set of labels for repeated use. This proposal renames these `Handle` and `GetHandle`, respectively, and adds further detail to the API specification for handles.

RFC 0003, 0007 updates (Metrics measure type, metrics handle specification), RFC 0004 (configurable aggregation) deleted #29

RFC 0003, 0007 updates (Metrics measure type, metrics handle specification), RFC 0004 (configurable aggregation) deleted #29

Conversation

jmacd commented Aug 27, 2019 • edited Loading

lzchen commented Aug 27, 2019 • edited Loading

lzchen commented Aug 27, 2019

bogdandrutu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

songy23 commented Sep 4, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmacd commented Sep 11, 2019

jmacd commented Sep 11, 2019

jmacd commented Aug 27, 2019 •

edited

Loading

lzchen commented Aug 27, 2019 •

edited

Loading

songy23 commented Sep 4, 2019 •

edited

Loading