Metrics stdout export pipeline #265

jmacd · 2019-10-31T15:36:33Z

This completes an end-to-end metrics export pipeline.

This introduces new interfaces in the sdk/export/metric package and new package-level documentation in sdk/metric/doc.go.

Two batcher implementations are provided, sdk/metric/batcher/defaultkeys and sdk/metric/batcher/ungrouped, both of which may be stateful (e.g., report sums) or stateless (e.g., report deltas). The defaultkeys batcher groups metrics by their recommended keys. The ungrouped batcher groups metrics at full dimensionality.

Aggregator interfaces are provided, sdk/metric/aggregator to abstract the concrete aggregator types, with interfaces like Sum, LastValue, Distribution, MaxSumCount. Currently we have one implementation for counter and gauge, and three choices for measure-kind metrics. Three standard "simple" selectors are provided in sdk/metric/selector/simple, which implement three policies for measure metrics (maxsumcount, ddsketch, array aggregators).

A new sdk/metric/controller/push package provides the glue to assemble a push-oriented metrics export pipeline. Controllers are responsible for deciding when to initiate collection, calling the exporter, and flushing at exit. The controller implements metric.Provider.

A new stdout exporter, exporter/metric/stdout formats metric updates (either stateful or stateless, depending on the configured batcher).

The example/basic main function has been modified to demonstrate how this all comes together. Note that we are squarely looking at open-telemetry/oteps#45, note that you can't get the global meter provider until it's been set or else you get no-op metric instruments.

exporter/metric/stdout/stdout.go

jmacd

Thanks @robskillington

exporter/metric/test/test.go

jmacd · 2019-11-14T21:30:31Z

sdk/export/metric/metric.go

-	Descriptor() *Descriptor
+// Exporter handles presentation of the checkpoint of aggregate
+// metrics.  This is the final stage of a metrics export pipeline,
+// where metric data are formatted for a specific system.


I think you're right, in general. There are several points where an alternate implementation could swap in. For the example you gave ("last N minutes"), the best solution would be to provide an alternate Aggregator implementation. The Aggregators are called synchronously from the user code, would would allow this kind of customization without needing to replace the Meter or Batcher implementations.

The current set of two Batcher implementations will probably be extended with a more configurable version -- currently we're using fixed policies to decide which kind of aggregator to use, and there are choices that ought to be configurable (especially for measures). The "AggregationSelector" interface is provided to allow a configuration where you can re-use the Batcher but assign different kinds of aggregator. Ultimately, these choices are all inter-related-- certain exporters support certain aggregators and dimensionality by their nature, all of which is controlled by the Batcher but dictated by the Exporter.

Possibly the current Batcher/Exporter model is only useful for relatively simple exporters, like stdout and statsd. For example, I wrote into the SDK specification "Metric exporters that wish to pull metric updates are likely to integrate a controller directly into the exporter itself."

sdk/metric/aggregator/api.go

jmacd · 2019-11-14T22:38:26Z

@rghetia @lizthegrey @paivagustavo I am still working through feedback FYI. I will continue later today.

jmacd

Addresses feedback from @rghetia and @lizthegrey; I still have a few more things and some feedback from @paivagustavo to go through, but this is getting close.

api/metric/sdkhelpers.go

sdk/metric/aggregator/counter/counter.go

sdk/metric/aggregator/ddsketch/ddsketch.go

sdk/export/metric/metric.go

sdk/metric/batcher/defaultkeys/defaultkeys.go

sdk/metric/batcher/ungrouped/ungrouped.go

ccaraman · 2019-11-15T06:36:23Z

If you ignore the test related code and minor fixes related to error propagation through the SDK, this PR isn't very large.

Tests should be reviewed just as critically as any other changes in a PR.

I agree with @bogdandrutu This PR is extremely difficult to review. By creating smaller and modular PR's, this reduces time between reviews, enables more community members to take part in the code review process(working toward reducing the burden on the few approvers that there are at the moment) and allows for scoped changes (easier to roll back and confirm correct implementation).

jmacd

Addresses the remaining feedback by @paivagustavo. Thanks, these were really good suggestions.

sdk/export/metric/metric.go

exporter/metric/stdout/stdout.go

sdk/metric/aggregator/array/array.go

exporter/metric/stdout/stdout.go

jmacd · 2019-11-15T07:48:20Z

Reviewers, I made one more test-related change. I was unsatisfied to introduce a metric SDK dependency on "github.com/benbjohnson/clock". It's a good library for testing with a mock clock (see sdk/metric/controller/push/push_test.go), but I've created tiny interfaces for Clock and Ticker directly in push.go to avoid importing the clock package. These could be moved into another location if any other code in this repo decides to test with a mock clock.

jmacd · 2019-11-15T08:16:13Z

@ccaraman I am sympathetic to your and Bogdan's complaint. I did not mean to imply that the tests should not be reviewed -- my impression was that Bogdan (who removed himself as CODEOWNER) was interested in reviewing the design and architecture of this code, not the tests. The reviewers have definitely been poking around in the tests.

I also stand by my earlier statements. I've been working on this metrics projects for months now, and I've consistently had trouble getting reviewers to review my incremental progress. First there were a number of OTEPs, and I had trouble getting timely reviews. The picture wasn't clear, so I charged ahead with a specification. The specification made the big picture a lot clearer, and the specification was accepted. There were still unaccepted OTEPs--and I'm still having trouble getting those reviewed--so I proceeded with an implementation, and the implementation has helped get those OTEPs reviewed, because code can be a lot clearer than text.

I continued to have trouble getting reviews for incremental progress, such as #172, #253, and #282, and my impression was that reviewers were not comfortable reviewing an incomplete SDK. The big picture was not clear. So again, I charged ahead, and this PR was the result. My current understanding of the problem is that metrics export is extremely complicated, the Meter, the Batcher, the Aggregators, and the Exporter are all related to each other, in various ways, depending on how they are going to be used. A Prometheus exporter is wildly different from a statsd exporter, and both are significantly different from what OpenCensus did or what OpenTelemetry wants.

This also feels like a double standard. When we began working on the tracing SDK we merged a bunch of code, without tests, because it helped kickstart development, and we revised and added tests incrementally. The metrics SDK is significantly more complicated than the trace SDK, and a lot changed as I wrote these tests, but I still see this as a prototype. We need to continue working on the specification and gaining experience with more sophisticated Batcher implementations. I believe delaying this PR in order to split it up and review incrementally will delay us until January, and I don't think it's worth the trouble.

Still, I agree that this PR is unfortunately large. Some of the comments by @paivagustavo made me realize that it's too big for me at this point (e.g., pointing out two places where an export.Record could be used to clean up the code).

Thanks to the reviewers @paivagustavo @rghetia @lizthegrey @krnowak @robskillington! I left unresolved conversations where I think a good question has been raised that can't be directly addressed in this PR--they will continue to be good discussion topics.

I think we should merge this and continue reviewing and revising this code (and its tests) as we begin to review the pending Prometheus and Dogstatsd exporter PRs.

jmacd added 19 commits October 30, 2019 05:07

Add MetricAggregator.Merge() implementations

19346a4

Upstream

cb9557a

Update from feedback

03f7854

Type

09d88a1

Upstream

865b7ab

Upstream

5719abe

Ckpt

c6ca4fd

Ckpt

acc2450

Upstream

5bebed5

Add push controller

5b23af4

Ckpt

80e0df8

Upstream

f533814

Add aggregator interfaces, stdout encoder

3423242

Modify basic main.go

a788631

Main is working

fbc50a1

Batch stdout output

49aa969

Sum udpate

c2c73be

Rename stdout

5ea9128

Add stateless/stateful Batcher options

f0d986c

jmacd requested review from bogdandrutu, pjanotti, rghetia, sjkaris and tedsuo as code owners October 31, 2019 15:36

jmacd added do not merge area:metrics Part of OpenTelemetry Metrics pkg:SDK Related to an SDK package labels Oct 31, 2019

Undo a for-loop in the example, remove a done TODO

63d79a8

jmacd mentioned this pull request Oct 31, 2019

Metric standard SDK behavior specification open-telemetry/opentelemetry-specification#259

Closed

lizthegrey reviewed Nov 1, 2019

View reviewed changes

exporter/metric/stdout/stdout.go Show resolved Hide resolved

jmacd commented Nov 14, 2019

View reviewed changes

Handle ErrNoLastValue case in stdout exporter

efb75ed

jmacd added 7 commits November 14, 2019 16:50

Move aggregator API into sdk/export/metric/aggregator

1153503

Update all aggregator exported-method comments

623a63a

Document the aggregator APIs

e298c94

More aggregator comments

d75bc6e

Add multiple updates to the ungrouped test

723d084

Fixes for feedback from Gustavo and Liz

8b5a4d6

Producer->CheckpointSet; add FinishedCollection

13e0580

jmacd commented Nov 15, 2019

View reviewed changes

jmacd added 3 commits November 14, 2019 22:17

Process takes an export.Record

b75059e

ReadCheckpoint->CheckpointSet

d03709a

EncodeLabels->Encode

93fe58f

Format a better inconsistent type error; add more aggregator API tests

782cabf

jmacd commented Nov 15, 2019

View reviewed changes

jmacd added 3 commits November 14, 2019 23:21

More RangeTest test coverage

eedaaab

Make benbjohnson/clock a test-only dependency

f67b47c

Merge

2e7007d

Handle ErrNoLastValue in stress_test

046db4a

Upstream

a294720

lizthegrey approved these changes Nov 15, 2019

View reviewed changes

rghetia approved these changes Nov 15, 2019

View reviewed changes

rghetia merged commit 9878f3b into open-telemetry:master Nov 15, 2019

jmacd deleted the jmacd/stdout_metrics branch November 15, 2019 21:04

lzchen mentioned this pull request Dec 19, 2019

Metrics export pipeline + metrics stdout exporter open-telemetry/opentelemetry-python#341

Merged

mayurkale22 mentioned this pull request Jan 28, 2020

feat: Metrics SDK - aggregator, batcher, controller open-telemetry/opentelemetry-js#738

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics stdout export pipeline #265

Metrics stdout export pipeline #265

jmacd commented Oct 31, 2019 •

edited

Loading

jmacd left a comment

jmacd Nov 14, 2019

jmacd commented Nov 14, 2019

jmacd left a comment

ccaraman commented Nov 15, 2019

jmacd left a comment

jmacd commented Nov 15, 2019

jmacd commented Nov 15, 2019

Metrics stdout export pipeline #265

Metrics stdout export pipeline #265

Conversation

jmacd commented Oct 31, 2019 • edited Loading

jmacd left a comment

Choose a reason for hiding this comment

jmacd Nov 14, 2019

Choose a reason for hiding this comment

jmacd commented Nov 14, 2019

jmacd left a comment

Choose a reason for hiding this comment

ccaraman commented Nov 15, 2019

jmacd left a comment

Choose a reason for hiding this comment

jmacd commented Nov 15, 2019

jmacd commented Nov 15, 2019

jmacd commented Oct 31, 2019 •

edited

Loading