Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics API RFCs to eliminate Raw statistics #4

Merged
merged 15 commits into from
Aug 13, 2019
37 changes: 37 additions & 0 deletions 0001-metric-pre-defined-labels.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Pre-defined label support for all metric operations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that I remember during the meeting about this RFC is how for example gRPC will set the grpc_method. In OpenCensus we decided to set the method as a Tag and the measure will read the value from the tags not from the extra labels provided by gRPC. The main reason is that we want other metrics recorded for this request to be able to use the grpc_method tag as a label as well.

For the status we used the extra labels (the measure labels proposed in this RFC) because this is known at the end of the request and cannot be used in other metrics.

So the main label that a system like gRPC will add which is the request (method) name is read from the Tags (a.k.a. DistributedContext).


Let all Metric objects (Cumulative, Gauge, ...) and Raw statistics support pre-defined label values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the "raw statistics" we don't know in advance all the label keys that will be used to break down these metrics. In OpenCensus what we did was to allow users to call record with what we called extra labels. We cannot do the same optimization that we do for the pre-defined metrics (aggregation, labels are pre-defined) because we don't know at compilation time the set of labels that we need to use. Also for "raw statistics" the final set of label values will be a subset of Tags (a.k.a DistributedContext) and the extraLabels.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least from my understanding, the idea isn't that label values are known at compile time, but that label keys are known. Does that clarify things?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is opposite to the idea of the "raw statistics" ("delayed metrics"). The whole point to support this is to allow deferred/delayed label keys and aggregation. See for example https://github.com/census-instrumentation/opencensus-specs/blob/master/stats/gRPC.md we do record all these measurements and recommend some views, but application owners (who configured the Views) can define any label keys they want. One example is when an A/B test is performed in ServiceA (ServiceA calls ServiceB), you would want to add the test_label as a label key for all rpc stats (maybe just some of them) to monitor the effects. But you cannot change the gRPC instrumented code so then we need to allow this mechanism.

Copy link
Member

@iredelmeier iredelmeier Jul 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like an issue with the gRPC instrumentation that can be solved with increased configurability, e.g., allowing end users to override the default label names upfront.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not about "not allowing" to override the default label names, is about adding extra labels that are propagated from different services with the context (tags/distributedContext). This is the whole idea. Also I mentioned the aggregation function (for example do I want to build a histogram with these buckets or that buckets, or just build a Sum).

If you try to come up with that "flexible configuration" you will see that you get very close to what we propose in this framework "raw statistics" + views.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1000 to being able to pre-specify defaults for some tags, rather than having all the tags need to be re-checked at runtime. To be all technical... this feels like the metrics version of performing partial application.


## Motivation

In the current `Metric.GetOrCreateTimeSeries` API for Gauges and Cumulatives, the caller obtains a `TimeSeries` handle for repeatedly recording metrics with certain pre-defined label values set. This is an important optimization, especially for exporting aggregated metrics.

The use of pre-defined labels improves usability too, for working with metrics in code. Application programs with long-lived objects and associated Metrics can compute predefined label values once (e.g., in a constructor), rather than once per call site.

The current API for recording Raw statistics does not support the same optimization or usability advantage. This RFC proposes to add support for pre-defined labels on all metrics.

## Explanation

In the current proposal, Metrics are used for pre-aggregated metric types, whereas Raw statistics are used for uncommon and vendor-specific aggregations. The optimization and the usability advantages gained with pre-defined labels should be extended to Raw statistics because they are equally important and equally applicable. This is a new requirement.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Raw statistics" which probably is not a good name (I made it up, and probably confused more than helped) is not for "uncommon and vendor-specific aggregations". We use them for example in gRPC and HTTP metrics where we don't want the framework developers (gRPC devs) to define the aggregation and labels for the end user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bogdandrutu HTTP and gRPC metrics seem to me like they should be regular metrics/not "raw statistics". Using them in this way seems inline with, for instance, Prometheus practices - see, e.g., the metric and label naming guide, the label examples in their querying examples, and [suggestions around label usage](https://prometheus.io/docs/practices/instrumentation/#things-to-watch-out-for.

Could you clarify why you think that HTTP and gRPC are better served by raw statistics?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See one of the example I gave to you in the other comment. I think this is what I mentioned in the SIG call that we do have different understanding of the "raw statistics" (delayed metrics). We do not want the gRPC/HTTP devs to decide how the metrics are aggregated, we want them to only record all these measurements and let the application owner decide how these will be aggregated.

One important thing is that the result of the "raw statistics" (delayed metrics) after the "Views" are applied are regular metrics, the main difference being who decides what are the label keys and aggregation function to be used.


For example, where the application wants to compute a histogram of some value (e.g., latency), there's good reason to pre-aggregate such information. In this example, it allows an implementation to effienctly export the histogram of latencies "grouped" into individual results by label value(s).

## Internal details

This RFC is accompanied by RFC 0002-metric-measure which proposes to create a new Metric type to replace Raw statistics. The metric type, named "Measure", would replace the existing concept and type named "Measure" in the metrics API. The new MeasureMetric object would support a `Record` method to record measurements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and also I talked with @SergeyKanzhelev about this. One important optimization for these metrics is "batch" recording. Because we don't know the labels and aggregation in advance what we do is we put every record entry into a producer-consumer queue and the consumer thread "applies" all defined views (views define the aggregation and labels for these metrics so that we can produce a Metric data). Alternative consideration is to do the work on the critical path and extract the necessary labels and apply the aggregation there, based on my experience with this system (used extensively in Google) this has more overhead than the previous implementation.

An option that I had in mind is to have Measure and MeasureCollection where MeasureCollection is a set of Measures that allow batch record for all the Measures in this collection.


## Trade-offs and mitigations

This is a refactoring of the existing proposal to cover more use-cases and arguably reduces API complexity.

## Prior art and alternatives

Prometheus supports the notion of vector metrics, which are those with declared dimensions. The vector-metric API supports a variety of methods like `WithLabelValues` to associate labels with a metric handle, similar to `GetOrCreateTimeSeries` in the existing proposal. As in this proposal, Prometheus supports vector metrics for all metric types.

## Open questions

This RFC is co-dependent on several others; it's an open question how to address this concern if the other RFCs are not accepted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I shared the same concern in my previous question on this PR. I think it makes more sense to have one or two independent RFC than a chain. I don't feel that strong right now during the review but probably when we merge this we can also merge the RFCs into one or two independent once.


## Future possibilities

This change will potentially help clarify the relationship between Metric types and Aggregation types. In a future RFC, we will propose that MeasureMetrics can be used to support arbitrary "advanced" aggregations including histograms and distribution summaries.
41 changes: 41 additions & 0 deletions 0002-metric-measure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Replace Raw statistics with Measure-type Metric

Define a new Metric type named "Measure" to cover existing "Raw" statistics uses.

## Motivation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the motivation see open-telemetry/opentelemetry-specification#145 which proposes to remove the Measurement class and move the Record method on the Measure.


The primary motivation is that Raw statistics should support the optimization and usability improvements associated with pre-defined label values (0001-metric-pre-defined-labels). By elevating non-Cumulative, non-Gauge statistics to the same conceptual level as Metrics in the API, we effectively make the type of a metric independent from whether it supports pre-defined labels.

This also makes it possible to eliminate the low-level `stats.Record` interface from the API specification entirely (0003-eliminate-stats-record).

## Explanation

This proposal suggests we think about which aggregations apply to a metric independently from its type. A MeasureMetric could be used to aggregate a Histogram, or a Summary, or _both_ of these aggregations simultaneously. This proposal makes metric type independent of aggregation type, whereas there is a precedent for combining these types into one.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I confused you a bit because we did not define the Histograms/Summaries see open-telemetry/opentelemetry-specification#146.

The idea is the following, we offer support for users to record metrics where the developer that instruments the code can choose between exporting a pre-defined aggregation/label metrics (type A) or a "raw statistics" (type B) (no aggregation, labels pre-defined). The way how this decision is made tries to follow this rules:

  1. If the metric is pre-aggregated (e.g. CPU usage is a counter in /proc) and no labels (here I mean that there are no labels used to break down this metric in /proc, but constant labels can be added to this metric if necessary. Then a type A metric should be used because we cannot apply any other aggregation (if you want to calculate rate which is possible for any counter (monotonic increasing metric) that can be done in all modern backends Prometheus, Stackdriver, etc.).
  2. If the metric is recorded with every request (rpc latency, bytes received, bytes send) and the developer of the instrumentation is not the application owner (third-party library like OpenTelemetry itself) then type B should be used. This way the application owner can decide how to aggregate these metrics.
  3. If the metric is a simple metric that does not make sense to aggregate in a different way (e.g. queue length - this is a gauge and probably does not require any labels to be used to breakdown) then a type A metric should be used.


The proposal here suggests that we think of the metric type in terms of the _action performed_ (i.e., which _verb_ applies?). Gauges support the `Set` action. Cumulatives support an `Inc` action. Measures support a `Record` action.

This extends the `GetOrCreateTimeSeries` (pre-defined labels) functionality supported by Metrics to what has been known as Raw statistics, satisfying the change in capability requested in RFC 0001-metric-pre-defined-labels. This allows programmers to predefine labels for all metrics. This is not only an important potential optimization for the programmer, it is a usability improvement in the code.

There are no new requirements stated in this RFC.

## Internal details

The type known as `MeasureMetric` is a direct replacement for Raw statistics. The `MeasureMetric.Record` method records a single observation of the metric. The `MeasureMetric.GetOrCreateTimeSeries` supports pre-defined keys as discussed in 0001-metric-pre-defined-labels.

## Trade-offs and mitigations

This change, while it eliminates the need for a Raw statistics concept, potentially introduces new required concepts. Whereas Raw statistics have no directly-declared aggregations, introducing MeasureMetric raises the question of which aggregations apply. We will propose how a programmer can declare recommended aggregations (and good defaults) in RFC 0004-configurable-aggregation.

## Prior art and alternatives

This Measure Metric API is conceptually close to the Prometheus [Histogram, Summary, and Untyped metric types](https://prometheus.io/docs/concepts/metric_types/).

## Open questions

With this proposal accepted, there would be three Metric types: Gauge, Cumulative, and Measure. This proposal does not directly address what to do over the existing, conflicting uses of "Measure".

## Future possibilities

This change enables metrics to support configurable aggregation types, which allows the programmer to provide recommended aggregations at the point where Metrics are defined. This will allow support for good out-of-the-box behavior for metrics defined by third-party libraries, for example.

Without Raw statistics in the API, it becomes possible to elimiante the low-level `stats.Record` API, which may also be desireable.
36 changes: 36 additions & 0 deletions 0003-eliminate-stats-record.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Eliminate stats.Record functionality

Remove `stats.Record` from the specification, following the MeasureMetric type (RFC 0002-metric-measure).

## Motivation

`stats.Record` is no longer a necessary interface. There are conceivable reasons to support it, but they are outweighed by the cost of implementing and supporting two interfaces for recording metrics and statistics.

## Explanation

In RFC 0002-metric-measure, a new MeasureMetric type is introduced to replace raw statistics, with support for pre-defined label values. With the new type introduced, it's now possible to record formerly-raw statistics through a higher-level Metric interface.

## Internal details

This simply involves removing the low-level `stats.Record` API from the specification, as it is no longer required.

## Trade-offs and mitigations

There are two reasons to maintain a low-level API that we know of:

1. For _generality_. An application that forwards metrics from another source may need to handle metrics in generic code. For these applications, having type-specific Metric handles could actually require more code to be written, whereas the low-level `stats.Record` API is more amenable to generic use.
1. For _atomicity_. An application that wishes to record multiple statistics in a single operation can feel confident computing formulas based on multiple metrics, not worry about inconsistent views of the data.

## Prior art and alternatives

Raw statistics were a solution to confusion found in existing metrics APIs over Metric types vs. Aggregation types. This proposal accompanies RFC 0001-metric-pre-defined-labels and RFC 0002-metric-measure.md in proposing that we think about Metric _type_ as independent of which aggregations apply. Once we have a Metric to support histogram and summary aggregations, we no longer need raw statistics, and we no longer need `stats.Record`. This avoids introducing new concepts (Raw statistics), at the same time departs from prior art in letting one Metric type support both Histogram and Summary aggregations.

## Open questions

Are either of the trade-offs described above important enough to keep the low-level `stats.Record` API?

## Future possibilities

This restricts future possibilities for the benefit of a smaller, simpler specification.

This leaves open the possibility of adding `stats.Record` functionality later, when the need is more clearly recognized.
73 changes: 73 additions & 0 deletions 0004-metric-configurable-aggregation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Let Metrics support configrable, recommended aggregations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to allow users to configure the aggregation for the "raw statistics" as well as the set of labels. In the OpenCensus case the full set of labels supported are Tags (available in the context when the record method is called) + extraLables (specifically recorded with every request), and users should define a subset from there.


Let the user configure recommended Metric aggregations (SUM, COUNT, MIN, MAX, LAST_VALUE, HISTOGRAM, SUMMARY).

## Motivation

In the current API proposal, Metric types like Gauge and Cumulative are mapped into specific aggregations: Gauge:LAST_VALUE and Cumulative:SUM. Depending on RFC 0002-metric-measure, which creates a new MeasureMetric type, this proposal introduces the ability to configure alternative, potentially multiple aggregations for Metrics. This allows the MeasureMetric type to support HISTOGRAM and SUMMARY aggregations, as an alternative to raw statistics.

## Explanation

This proposal completes the elimination of Raw statistics by recognizing that aggregations should be independent of metric type. This recognizes that _sometimes_ we have a cumulative but want to compute a histogram of increment values, and _sometimes_ we have a measure that has multiple interesting aggregations.

Following this change, we should think of the _Metric type_ as:

1. Indicating something about what kind of numbers are being recorded (i.e., the input domain, e.g., restricted to values >= 0?)
1. For Gauges: Something pre-computed where rate or count is not relevant
1. For Cumulatives: Something where rate or count is relevant
1. For Measures: Something where individual values are relevant
1. Indicating something about the default interpretation, based on the action verb (Set, Inc, Record, etc.)
1. For Gauges: the action is Set()
1. For Cumulatives: the action is Inc()
1. For Measures: the action is Record()
1. Unless the programmer declares otherwise, suggesting a default aggregation
1. For Gauges: LAST_VALUE is interesting, SUM is likely not interesting
1. For Cumulatives: SUM is interesting, LAST_VALUE is likely not interesting
1. For Measures: all aggregations apply, default is MIN, MAX, SUM, COUNT.

## Internal details

Metric constructors should take an optional list of aggregations, to override the default behavior. When constructed with an explicit list of aggregations, the implementation may use this as a hint about which aggregations should be exported by default. However, the implementation is not bound by these recommendations in any way and is free to control which aggregations that are applied.

The standard defined aggregations are broken into two groups, those which are "decomposable" (i.e., inexpensive) and those which are not.

The decomposable aggregations are simple to define:

1. SUM: The sum of observed values.
1. COUNT: The number of observations.
1. MIN: The smallest value.
1. MAX: The largest value.
1. LAST_VALUE: The latest value.

The non-decomposable aggregations do not have standard definitions, they are purely advisory. The intention behind these are:

1. HISTOGRAM: The intended output is a distribution summary, specifically summarizing counts into non-overlapping ranges.
1. SUMMARY: This is a more generic way to request information about a distribution, perhaps represented in some vendor-specific way / not a histogram.

## Example

To declare a MeasureMetric,

```
myMetric := metric.NewMeasureMetric(
"ex.com/mymetric",
metric.WithAggregations(metric.SUM, metric.COUNT),
metric.WithLabelKeys(aKey, bKey))
)
```

Here, we have declared a Measure-type metric with recommended SUM and COUNT aggregations (allowing to compute the average) with `aKey` and `bKey` as recommended aggregation dimensions. While the SDK has full control over which aggregations are actually performed, the programmer has specified a good default behavior for the implementation to use.

## Trade-offs and mitigations

This avoids requiring programmers to use the `view` API, which is an SDK API, not a user-facing instrumentation API. Letting the application programmer recommend aggregations directly gives the implementation more information about the raw statistics. Letting programmers declare their intent has few downsides, since there is a well-defined default behavior.

## Prior art and alternatives

Existing systems generaly declare separate Metric types according to the desired aggregation. Raw statistics were invented to overcome this, and the present proposal brings back the ability to specify an Aggregation at the point where a Metric is defined.

## Open questions

There are questions about the value of the MIN and MAX aggregations. While they are simple to compute, they are difficult to use in practice.

There are questions about the interpretation of HISTOGRAM and SUMMARY. The point of Raw statistics was that we shouldn't specify these aggregations because they are expensive and many implementations are possible. This is still true. What is the value in specifying HISTOGRAM as opposed to SUMMARY? How is SUMMARY different from MIN/MAX/COUNT/SUM, does it imply implementation-defined quantiles?