-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add spanmetricsprocessor readme, config , factory, tests #1917
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
include ../../Makefile.Common |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
# Span Metrics Processor | ||
|
||
Aggregates Request, Error and Duration (R.E.D) metrics from span data. | ||
|
||
**Request** counts are computed as the number of spans seen per unique set of dimensions, including Errors. | ||
For example, the following metric shows 142 calls: | ||
``` | ||
promexample_calls{http_method="GET",http_status_code="200",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET"} 142 | ||
``` | ||
Multiple metrics can be aggregated if, for instance, a user wishes to view call counts just on `service_name` and `operation`. | ||
|
||
**Error** counts are computed from the Request counts which have an "Error" Status Code metric dimension. | ||
For example, the following metric indicates 220 errors: | ||
``` | ||
promexample_calls{http_method="GET",http_status_code="503",operation="/checkout",service_name="frontend",span_kind="SPAN_KIND_CLIENT",status_code="STATUS_CODE_ERROR"} 220 | ||
``` | ||
|
||
**Duration** is computed from the difference between the span start and end times and inserted into the | ||
relevant latency histogram time bucket for each unique set dimensions. | ||
For example, the following latency buckets indicate the vast majority of spans (9K) have a 100ms latency: | ||
``` | ||
promexample_latency_bucket{http_method="GET",http_status_code="200",label1="value1",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="2"} 327 | ||
promexample_latency_bucket{http_method="GET",http_status_code="200",label1="value1",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="6"} 751 | ||
promexample_latency_bucket{http_method="GET",http_status_code="200",label1="value1",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="10"} 1195 | ||
promexample_latency_bucket{http_method="GET",http_status_code="200",label1="value1",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="100"} 10180 | ||
promexample_latency_bucket{http_method="GET",http_status_code="200",label1="value1",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="250"} 10180 | ||
... | ||
``` | ||
|
||
Each metric will have _at least_ the following dimensions because they are common across all spans: | ||
- Service name | ||
- Operation | ||
- Span kind | ||
- Status code | ||
|
||
This processor lets traces to continue through the pipeline unmodified. | ||
|
||
The following settings are required: | ||
|
||
- `metrics_exporter`: the name of the exporter that this processor will write metrics to. This exporter **must** be present in a pipeline. | ||
|
||
The following settings can be optionally configured: | ||
|
||
- `latency_histogram_buckets`: the list of durations defining the latency histogram buckets. | ||
- Default: `[2ms, 4ms, 6ms, 8ms, 10ms, 50ms, 100ms, 200ms, 400ms, 800ms, 1s, 1400ms, 2s, 5s, 10s, 15s]` | ||
- `dimensions`: the list of dimensions to add together with the default dimensions defined above. Each additional dimension is defined with a `name` which is looked up in the span's collection of attributes. If the `name`d attribute is missing in the span, the optional provided `default` is used. If no `default` is provided, this dimension will be **omitted** from the metric. | ||
|
||
Example: | ||
|
||
```yaml | ||
receivers: | ||
jaeger: | ||
protocols: | ||
thrift_http: | ||
endpoint: "0.0.0.0:14278" | ||
|
||
# Dummy receiver that's never used, because a pipeline is required to have one. | ||
otlp/spanmetrics: | ||
protocols: | ||
grpc: | ||
endpoint: "localhost:12345" | ||
|
||
otlp: | ||
protocols: | ||
grpc: | ||
endpoint: "localhost:55677" | ||
|
||
processors: | ||
spanmetrics: | ||
metrics_exporter: otlp/spanmetrics | ||
latency_histogram_buckets: [2ms, 6ms, 10ms, 100ms, 250ms] | ||
dimensions: | ||
- name: http.method | ||
default: GET | ||
- name: http.status_code | ||
|
||
exporters: | ||
jaeger: | ||
endpoint: localhost:14250 | ||
|
||
otlp/spanmetrics: | ||
endpoint: "localhost:55677" | ||
insecure: true | ||
|
||
prometheus: | ||
endpoint: "0.0.0.0:8889" | ||
namespace: promexample | ||
|
||
pipelines: | ||
traces: | ||
receivers: [jaeger] | ||
processors: [spanmetrics, batch, queued_retry] | ||
exporters: [jaeger] | ||
|
||
# The exporter name must match the metrics_exporter name. | ||
# The receiver is just a dummy and never used; added to pass validation requiring at least one receiver in a pipeline. | ||
metrics/spanmetrics: | ||
receivers: [otlp/spanmetrics] | ||
exporters: [otlp/spanmetrics] | ||
|
||
metrics: | ||
receivers: [otlp] | ||
exporters: [prometheus] | ||
``` | ||
|
||
The full list of settings exposed for this processor are documented [here](./config.go) with detailed sample configuration [here](./testdata). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
// Copyright The OpenTelemetry Authors | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package spanmetricsprocessor | ||
|
||
import ( | ||
"time" | ||
|
||
"go.opentelemetry.io/collector/config/configmodels" | ||
) | ||
|
||
type Dimension struct { | ||
Name string `mapstructure:"name"` | ||
Default *string `mapstructure:"default"` | ||
} | ||
|
||
type Config struct { | ||
configmodels.ProcessorSettings `mapstructure:",squash"` | ||
|
||
// MetricsExporter is the name of the metrics exporter to use to ship metrics. | ||
MetricsExporter string `mapstructure:"metrics_exporter"` | ||
|
||
// LatencyHistogramBuckets is the list of durations representing latency histogram buckets. | ||
// See defaultLatencyHistogramBucketsMs in processor.go for the default value. | ||
LatencyHistogramBuckets []time.Duration `mapstructure:"latency_histogram_buckets"` | ||
|
||
// Dimensions defines the list of additional dimensions on top of the provided: | ||
// - service.name | ||
// - operation | ||
// - span.kind | ||
// - status.code | ||
// The dimensions will be fetched from the span's attributes. Examples of some conventionally used attributes: | ||
// https://github.com/open-telemetry/opentelemetry-collector/blob/master/translator/conventions/opentelemetry.go. | ||
Dimensions []Dimension `mapstructure:"dimensions"` | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
// Copyright The OpenTelemetry Authors | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package spanmetricsprocessor | ||
|
||
import ( | ||
"path" | ||
"testing" | ||
"time" | ||
|
||
"github.com/stretchr/testify/assert" | ||
"github.com/stretchr/testify/require" | ||
"go.opentelemetry.io/collector/component/componenttest" | ||
"go.opentelemetry.io/collector/config/configmodels" | ||
"go.opentelemetry.io/collector/config/configtest" | ||
"go.opentelemetry.io/collector/exporter/jaegerexporter" | ||
"go.opentelemetry.io/collector/exporter/otlpexporter" | ||
"go.opentelemetry.io/collector/exporter/prometheusexporter" | ||
"go.opentelemetry.io/collector/processor/batchprocessor" | ||
"go.opentelemetry.io/collector/processor/queuedprocessor" | ||
"go.opentelemetry.io/collector/receiver/jaegerreceiver" | ||
"go.opentelemetry.io/collector/receiver/otlpreceiver" | ||
) | ||
|
||
func TestLoadConfig(t *testing.T) { | ||
defaultMethod := "GET" | ||
testcases := []struct { | ||
configFile string | ||
wantMetricsExporter string | ||
wantLatencyHistogramBuckets []time.Duration | ||
wantDimensions []Dimension | ||
}{ | ||
{configFile: "config-2-pipelines.yaml", wantMetricsExporter: "prometheus"}, | ||
{configFile: "config-3-pipelines.yaml", wantMetricsExporter: "otlp/spanmetrics"}, | ||
{ | ||
configFile: "config-full.yaml", | ||
wantMetricsExporter: "otlp/spanmetrics", | ||
wantLatencyHistogramBuckets: []time.Duration{ | ||
2 * time.Millisecond, | ||
6 * time.Millisecond, | ||
10 * time.Millisecond, | ||
100 * time.Millisecond, | ||
250 * time.Millisecond, | ||
}, | ||
wantDimensions: []Dimension{ | ||
{"http.method", &defaultMethod}, | ||
{"http.status_code", nil}, | ||
}, | ||
}, | ||
} | ||
for _, tc := range testcases { | ||
t.Run(tc.configFile, func(t *testing.T) { | ||
// Prepare | ||
factories, err := componenttest.ExampleComponents() | ||
require.NoError(t, err) | ||
|
||
factories.Receivers["otlp"] = otlpreceiver.NewFactory() | ||
factories.Receivers["jaeger"] = jaegerreceiver.NewFactory() | ||
|
||
factories.Processors[typeStr] = NewFactory() | ||
factories.Processors["batch"] = batchprocessor.NewFactory() | ||
factories.Processors["queued_retry"] = queuedprocessor.NewFactory() | ||
|
||
factories.Exporters["otlp"] = otlpexporter.NewFactory() | ||
factories.Exporters["prometheus"] = prometheusexporter.NewFactory() | ||
factories.Exporters["jaeger"] = jaegerexporter.NewFactory() | ||
|
||
// Test | ||
cfg, err := configtest.LoadConfigFile(t, path.Join(".", "testdata", tc.configFile), factories) | ||
|
||
// Verify | ||
require.NoError(t, err) | ||
require.NotNil(t, cfg) | ||
assert.Equal(t, | ||
&Config{ | ||
ProcessorSettings: configmodels.ProcessorSettings{ | ||
NameVal: "spanmetrics", | ||
TypeVal: "spanmetrics", | ||
}, | ||
MetricsExporter: tc.wantMetricsExporter, | ||
LatencyHistogramBuckets: tc.wantLatencyHistogramBuckets, | ||
Dimensions: tc.wantDimensions, | ||
}, | ||
cfg.Processors["spanmetrics"], | ||
) | ||
}) | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
// Copyright The OpenTelemetry Authors | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package spanmetricsprocessor | ||
|
||
import ( | ||
"context" | ||
|
||
"go.opentelemetry.io/collector/component" | ||
"go.opentelemetry.io/collector/config/configmodels" | ||
"go.opentelemetry.io/collector/consumer" | ||
"go.opentelemetry.io/collector/processor/processorhelper" | ||
) | ||
|
||
const ( | ||
// The value of "type" key in configuration. | ||
typeStr = "spanmetrics" | ||
) | ||
|
||
// NewFactory creates a factory for the spanmetrics processor. | ||
func NewFactory() component.ProcessorFactory { | ||
return processorhelper.NewFactory( | ||
typeStr, | ||
createDefaultConfig, | ||
processorhelper.WithTraces(createTraceProcessor), | ||
) | ||
} | ||
|
||
func createDefaultConfig() configmodels.Processor { | ||
return &Config{ | ||
ProcessorSettings: configmodels.ProcessorSettings{ | ||
TypeVal: typeStr, | ||
NameVal: typeStr, | ||
}, | ||
} | ||
} | ||
|
||
func createTraceProcessor(_ context.Context, params component.ProcessorCreateParams, cfg configmodels.Processor, nextConsumer consumer.TracesConsumer) (component.TracesProcessor, error) { | ||
return newProcessor(params.Logger, cfg, nextConsumer) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
// Copyright The OpenTelemetry Authors | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package spanmetricsprocessor | ||
|
||
import ( | ||
"context" | ||
"testing" | ||
"time" | ||
|
||
"github.com/stretchr/testify/assert" | ||
"go.opentelemetry.io/collector/component" | ||
"go.opentelemetry.io/collector/consumer/consumertest" | ||
"go.uber.org/zap" | ||
) | ||
|
||
func TestNewProcessor(t *testing.T) { | ||
defaultMethod := "GET" | ||
for _, tc := range []struct { | ||
name string | ||
metricsExporter string | ||
latencyHistogramBuckets []time.Duration | ||
dimensions []Dimension | ||
wantLatencyHistogramBuckets []float64 | ||
wantDimensions []Dimension | ||
}{ | ||
{ | ||
name: "simplest config (use defaults)", | ||
wantLatencyHistogramBuckets: defaultLatencyHistogramBucketsMs, | ||
}, | ||
{ | ||
name: "latency histogram configured with catch-all bucket to check no additional catch-all bucket inserted", | ||
latencyHistogramBuckets: []time.Duration{2 * time.Millisecond, maxDuration}, | ||
wantLatencyHistogramBuckets: []float64{2, maxDurationMs}, | ||
}, | ||
{ | ||
name: "full config with no catch-all bucket and check the catch-all bucket is inserted", | ||
latencyHistogramBuckets: []time.Duration{2 * time.Millisecond}, | ||
dimensions: []Dimension{ | ||
{"http.method", &defaultMethod}, | ||
{"http.status_code", nil}, | ||
}, | ||
wantLatencyHistogramBuckets: []float64{2, maxDurationMs}, | ||
wantDimensions: []Dimension{ | ||
{"http.method", &defaultMethod}, | ||
{"http.status_code", nil}, | ||
}, | ||
}, | ||
} { | ||
t.Run(tc.name, func(t *testing.T) { | ||
// Prepare | ||
factory := NewFactory() | ||
|
||
creationParams := component.ProcessorCreateParams{Logger: zap.NewNop()} | ||
cfg := factory.CreateDefaultConfig().(*Config) | ||
cfg.LatencyHistogramBuckets = tc.latencyHistogramBuckets | ||
cfg.Dimensions = tc.dimensions | ||
|
||
// Test | ||
traceProcessor, err := factory.CreateTracesProcessor(context.Background(), creationParams, cfg, consumertest.NewTracesNop()) | ||
smp := traceProcessor.(*processorImp) | ||
|
||
// Verify | ||
assert.Nil(t, err) | ||
assert.NotNil(t, smp) | ||
|
||
assert.Equal(t, tc.wantLatencyHistogramBuckets, smp.latencyBounds) | ||
assert.Equal(t, tc.wantDimensions, smp.dimensions) | ||
}) | ||
} | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering, how much overhead are we expecting for another pipeline? Isn't it just running
consumeTraces
twice instead of once? Or, I've never tried, but is it not allowed to have the same receiver in multiple pipelines? That seems like it would make the config really understandable.This is similar to the translator approach, but seems more intuitive - just let receivers take part in multiple pipelines without any other type of component, the collector should be able to manage wiring a receiver to multiple sinks without a big problem and seems like something that should be supported anyways since users will probably try this sort of config (I haven't tried it yet, does it really not work?).
I think this is another argument to start with this current implementation though :) But just wondering if this idea has come up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your thoughts, @anuraaga.
Yup :)
There's some potential performance overhead if an earlier pipeline's first processor is blocked or slow. From the OTEL Design doc:
I like your proposed config, it does look very intuitive.
However, pipelines don't support mixed telemetry types right now; if this were possible, I think your suggestion makes most sense. Mixed telemetry type pipelines is something I'd like to tackle once the
spanmetricsprocessor
is done, because it sounds like a fairly significant change.This is the error message I get if I use a
jaeger
receiver withprometheus
exporter: