Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics stdout export pipeline #265

Merged
merged 86 commits into from
Nov 15, 2019

Conversation

jmacd
Copy link
Contributor

@jmacd jmacd commented Oct 31, 2019

This completes an end-to-end metrics export pipeline.

This introduces new interfaces in the sdk/export/metric package and new package-level documentation in sdk/metric/doc.go.

Two batcher implementations are provided, sdk/metric/batcher/defaultkeys and sdk/metric/batcher/ungrouped, both of which may be stateful (e.g., report sums) or stateless (e.g., report deltas). The defaultkeys batcher groups metrics by their recommended keys. The ungrouped batcher groups metrics at full dimensionality.

Aggregator interfaces are provided, sdk/metric/aggregator to abstract the concrete aggregator types, with interfaces like Sum, LastValue, Distribution, MaxSumCount. Currently we have one implementation for counter and gauge, and three choices for measure-kind metrics. Three standard "simple" selectors are provided in sdk/metric/selector/simple, which implement three policies for measure metrics (maxsumcount, ddsketch, array aggregators).

A new sdk/metric/controller/push package provides the glue to assemble a push-oriented metrics export pipeline. Controllers are responsible for deciding when to initiate collection, calling the exporter, and flushing at exit. The controller implements metric.Provider.

A new stdout exporter, exporter/metric/stdout formats metric updates (either stateful or stateless, depending on the configured batcher).

The example/basic main function has been modified to demonstrate how this all comes together. Note that we are squarely looking at open-telemetry/oteps#45, note that you can't get the global meter provider until it's been set or else you get no-op metric instruments.

@jmacd jmacd added do not merge area:metrics Part of OpenTelemetry Metrics pkg:SDK Related to an SDK package labels Oct 31, 2019
Copy link
Contributor Author

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exporter/metric/test/test.go Outdated Show resolved Hide resolved
Descriptor() *Descriptor
// Exporter handles presentation of the checkpoint of aggregate
// metrics. This is the final stage of a metrics export pipeline,
// where metric data are formatted for a specific system.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, in general. There are several points where an alternate implementation could swap in. For the example you gave ("last N minutes"), the best solution would be to provide an alternate Aggregator implementation. The Aggregators are called synchronously from the user code, would would allow this kind of customization without needing to replace the Meter or Batcher implementations.

The current set of two Batcher implementations will probably be extended with a more configurable version -- currently we're using fixed policies to decide which kind of aggregator to use, and there are choices that ought to be configurable (especially for measures). The "AggregationSelector" interface is provided to allow a configuration where you can re-use the Batcher but assign different kinds of aggregator. Ultimately, these choices are all inter-related-- certain exporters support certain aggregators and dimensionality by their nature, all of which is controlled by the Batcher but dictated by the Exporter.

Possibly the current Batcher/Exporter model is only useful for relatively simple exporters, like stdout and statsd. For example, I wrote into the SDK specification "Metric exporters that wish to pull metric updates are likely to integrate a controller directly into the exporter itself."

sdk/metric/aggregator/api.go Outdated Show resolved Hide resolved
sdk/metric/aggregator/api.go Outdated Show resolved Hide resolved
@jmacd
Copy link
Contributor Author

jmacd commented Nov 14, 2019

@rghetia @lizthegrey @paivagustavo I am still working through feedback FYI. I will continue later today.

Copy link
Contributor Author

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addresses feedback from @rghetia and @lizthegrey; I still have a few more things and some feedback from @paivagustavo to go through, but this is getting close.

api/metric/sdkhelpers.go Outdated Show resolved Hide resolved
sdk/metric/aggregator/counter/counter.go Outdated Show resolved Hide resolved
sdk/metric/aggregator/counter/counter.go Outdated Show resolved Hide resolved
sdk/metric/aggregator/counter/counter.go Show resolved Hide resolved
sdk/metric/aggregator/ddsketch/ddsketch.go Outdated Show resolved Hide resolved
sdk/export/metric/metric.go Outdated Show resolved Hide resolved
sdk/metric/batcher/defaultkeys/defaultkeys.go Outdated Show resolved Hide resolved
sdk/metric/batcher/defaultkeys/defaultkeys.go Outdated Show resolved Hide resolved
sdk/metric/batcher/defaultkeys/defaultkeys.go Outdated Show resolved Hide resolved
sdk/metric/batcher/ungrouped/ungrouped.go Outdated Show resolved Hide resolved
@ccaraman
Copy link

If you ignore the test related code and minor fixes related to error propagation through the SDK, this PR isn't very large.

Tests should be reviewed just as critically as any other changes in a PR.

I agree with @bogdandrutu This PR is extremely difficult to review. By creating smaller and modular PR's, this reduces time between reviews, enables more community members to take part in the code review process(working toward reducing the burden on the few approvers that there are at the moment) and allows for scoped changes (easier to roll back and confirm correct implementation).

Copy link
Contributor Author

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addresses the remaining feedback by @paivagustavo. Thanks, these were really good suggestions.

sdk/export/metric/metric.go Outdated Show resolved Hide resolved
sdk/export/metric/metric.go Outdated Show resolved Hide resolved
sdk/export/metric/metric.go Outdated Show resolved Hide resolved
exporter/metric/stdout/stdout.go Outdated Show resolved Hide resolved
sdk/metric/aggregator/array/array.go Outdated Show resolved Hide resolved
exporter/metric/stdout/stdout.go Outdated Show resolved Hide resolved
@jmacd
Copy link
Contributor Author

jmacd commented Nov 15, 2019

Reviewers, I made one more test-related change. I was unsatisfied to introduce a metric SDK dependency on "github.com/benbjohnson/clock". It's a good library for testing with a mock clock (see sdk/metric/controller/push/push_test.go), but I've created tiny interfaces for Clock and Ticker directly in push.go to avoid importing the clock package. These could be moved into another location if any other code in this repo decides to test with a mock clock.

@jmacd
Copy link
Contributor Author

jmacd commented Nov 15, 2019

@ccaraman I am sympathetic to your and Bogdan's complaint. I did not mean to imply that the tests should not be reviewed -- my impression was that Bogdan (who removed himself as CODEOWNER) was interested in reviewing the design and architecture of this code, not the tests. The reviewers have definitely been poking around in the tests.

I also stand by my earlier statements. I've been working on this metrics projects for months now, and I've consistently had trouble getting reviewers to review my incremental progress. First there were a number of OTEPs, and I had trouble getting timely reviews. The picture wasn't clear, so I charged ahead with a specification. The specification made the big picture a lot clearer, and the specification was accepted. There were still unaccepted OTEPs--and I'm still having trouble getting those reviewed--so I proceeded with an implementation, and the implementation has helped get those OTEPs reviewed, because code can be a lot clearer than text.

I continued to have trouble getting reviews for incremental progress, such as #172, #253, and #282, and my impression was that reviewers were not comfortable reviewing an incomplete SDK. The big picture was not clear. So again, I charged ahead, and this PR was the result. My current understanding of the problem is that metrics export is extremely complicated, the Meter, the Batcher, the Aggregators, and the Exporter are all related to each other, in various ways, depending on how they are going to be used. A Prometheus exporter is wildly different from a statsd exporter, and both are significantly different from what OpenCensus did or what OpenTelemetry wants.

This also feels like a double standard. When we began working on the tracing SDK we merged a bunch of code, without tests, because it helped kickstart development, and we revised and added tests incrementally. The metrics SDK is significantly more complicated than the trace SDK, and a lot changed as I wrote these tests, but I still see this as a prototype. We need to continue working on the specification and gaining experience with more sophisticated Batcher implementations. I believe delaying this PR in order to split it up and review incrementally will delay us until January, and I don't think it's worth the trouble.

Still, I agree that this PR is unfortunately large. Some of the comments by @paivagustavo made me realize that it's too big for me at this point (e.g., pointing out two places where an export.Record could be used to clean up the code).

Thanks to the reviewers @paivagustavo @rghetia @lizthegrey @krnowak @robskillington! I left unresolved conversations where I think a good question has been raised that can't be directly addressed in this PR--they will continue to be good discussion topics.

I think we should merge this and continue reviewing and revising this code (and its tests) as we begin to review the pending Prometheus and Dogstatsd exporter PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:metrics Part of OpenTelemetry Metrics pkg:SDK Related to an SDK package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants