Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spanmetricsprocessor readme, config , factory, tests #1917

Merged
merged 1 commit into from
Jan 8, 2021

Conversation

albertteoh
Copy link
Contributor

@albertteoh albertteoh commented Dec 28, 2020

Signed-off-by: albertteoh albert.teoh@logz.io

Description:

Adds a new processor for aggregating span data to produce Requests Error and Duration (R.E.D) metrics.

This is the first PR with a few to follow; containing the structure of the processor including:

  • README
  • Config struct
  • Factory
  • Processor skeleton code
  • Tests

Note: as advised by maintainers during the Agent/Collector SIG, a workaround configuration of adding a dummy receiver and pipeline is required due to current limitations of the pipeline design and constraints.

Processor vs Exporter:
This component can be implemented both as a processor or exporter. The processor path was taken because:

  • From the Opentelemetry Collector Architecture document, "processors" transform data and "they can also generate new data". Thus, a processor seems to be more fitting for something that aggregates span data into metrics than an exporter.
  • Processors allow the flexibility of writing to a metrics exporter without the overhead of an additional pipeline, if no processing of metrics is required. An example is exporting span metrics directly to prometheus.
  • host:port configuration is not required as it is encapsulated in the configured exporter to write span metrics to.
  • If there is a need to process span metrics further, an additional pipeline can be introduced.

The advantages with an exporter approach are:

  • It is more explicit, and does not need dummy receivers. It's very clear what the pipeline is doing just from reading the configuration.
  • There are discussions on Proposal: Span Stats processor #403 on a proposal to introduce translators, which are essentially both exporters and receivers. If the spanmetricsprocessor adopts this approach, it should be an easier and more natural transition from exporters to translators.

Link to tracking Issue: #403

Testing:

  • Unit tests.

Documentation:

  • README file.
  • Sample config in testdata directory.
  • Comments in Config struct.

@albertteoh albertteoh requested a review from a team December 28, 2020 21:46
@codecov
Copy link

codecov bot commented Dec 28, 2020

Codecov Report

Merging #1917 (1a8a2c6) into master (c7d7acd) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1917      +/-   ##
==========================================
+ Coverage   89.95%   89.97%   +0.02%     
==========================================
  Files         381      383       +2     
  Lines       18519    18570      +51     
==========================================
+ Hits        16659    16709      +50     
- Misses       1390     1391       +1     
  Partials      470      470              
Flag Coverage Δ
integration 69.82% <ø> (ø)
unit 88.70% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
processor/spanmetricsprocessor/factory.go 100.00% <100.00%> (ø)
processor/spanmetricsprocessor/processor.go 100.00% <100.00%> (ø)
processor/groupbytraceprocessor/event.go 95.96% <0.00%> (-0.81%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c7d7acd...1a8a2c6. Read the comment docs.

…ests

Signed-off-by: albertteoh <albert.teoh@logz.io>

# The exporter name must match the metrics_exporter name.
# The receiver is just a dummy and never used; added to pass validation requiring at least one receiver in a pipeline.
metrics/spanmetrics:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Processors allow the flexibility of writing to a metrics exporter without the overhead of an additional pipeline, if no processing of metrics is required. An example is exporting span metrics directly to prometheus.

Just wondering, how much overhead are we expecting for another pipeline? Isn't it just running consumeTraces twice instead of once? Or, I've never tried, but is it not allowed to have the same receiver in multiple pipelines? That seems like it would make the config really understandable.

pipelines:
  traces:
    receivers: [jaeger]
    processors: [batch, queued_retry]
    exporters: [jaeger]

  tracemetrics:
    receivers: [jaeger]
    processors: [spanmetrics]
    exporters: [prometheus]

This is similar to the translator approach, but seems more intuitive - just let receivers take part in multiple pipelines without any other type of component, the collector should be able to manage wiring a receiver to multiple sinks without a big problem and seems like something that should be supported anyways since users will probably try this sort of config (I haven't tried it yet, does it really not work?).

I think this is another argument to start with this current implementation though :) But just wondering if this idea has come up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your thoughts, @anuraaga.

Or, I've never tried, but is it not allowed to have the same receiver in multiple pipelines?

Yup :)

how much overhead are we expecting for another pipeline? Isn't it just running consumeTraces twice instead of once?

There's some potential performance overhead if an earlier pipeline's first processor is blocked or slow. From the OTEL Design doc:

The data propagation from receiver to FanOutConnector and then to processors is via synchronous function call. This means that if one processor blocks the call the other pipelines that are attached to this receiver will be blocked from receiving the same data and the receiver itself will stop processing and forwarding newly received data.

I like your proposed config, it does look very intuitive.

However, pipelines don't support mixed telemetry types right now; if this were possible, I think your suggestion makes most sense. Mixed telemetry type pipelines is something I'd like to tackle once the spanmetricsprocessor is done, because it sounds like a fairly significant change.

This is the error message I get if I use a jaeger receiver with prometheus exporter:

application run finished with error: cannot setup pipelines: cannot build builtExporters: pipeline "traces" of data type "traces" has an exporter "prometheus", which does not support that data type

Copy link
Member

@tigrannajaryan tigrannajaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I assume the actual implementation comes next.
Note: if the new proposal that allows pipelines of different types to be connected is accepted this processor may need to reworked (or become an exporter).
Please mark this processor as experimental in the README and clearly specify that it may be changed in incompatible way in the future.

@tigrannajaryan tigrannajaryan merged commit b2df904 into open-telemetry:master Jan 8, 2021
@albertteoh albertteoh deleted the atm-part-0 branch January 9, 2021 00:15
@albertteoh
Copy link
Contributor Author

Thank you for the review @anuraaga & @tigrannajaryan 🙏

this processor may need to reworked (or become an exporter).

Yes, understood; I'm definitely onboard with making this as user-friendly as possible.

Please mark this processor as experimental in the README and clearly specify that it may be changed in incompatible way in the future.

Okay, I'll do that.

tigrannajaryan pushed a commit that referenced this pull request Jan 28, 2021
**Description:** Adds spanmetricsprocessor logic as a follow-up to #1917.

**Link to tracking Issue:** #403

**Testing:**
- Unit and integration tested locally (by adding to `cmd/otelcontribcol/components.go`).
- Performance tested (added metrics key cache to improve performance) and added benchmark function.

**Documentation:** Add relevant documentation to exported structs and functions.
dyladan referenced this pull request in dynatrace-oss-contrib/opentelemetry-collector-contrib Jan 29, 2021
- Fix links
- Minor cleanup and consistency changes
pmatyjasek-sumo referenced this pull request in pmatyjasek-sumo/opentelemetry-collector-contrib Apr 28, 2021
**Description:** Adds spanmetricsprocessor logic as a follow-up to #1917.

**Link to tracking Issue:** #403

**Testing:**
- Unit and integration tested locally (by adding to `cmd/otelcontribcol/components.go`).
- Performance tested (added metrics key cache to improve performance) and added benchmark function.

**Documentation:** Add relevant documentation to exported structs and functions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants