Add spanmetricsprocessor readme, config, sample config, factory and t…

…ests (#1917) Signed-off-by: albertteoh <albert.teoh@logz.io> Adds a new processor for aggregating span data to produce Requests Error and Duration (R.E.D) metrics. This is the first PR with a few to follow; containing the structure of the processor including: - README - Config struct - Factory - Processor skeleton code - Tests Note: as advised by maintainers during the Agent/Collector SIG, a workaround configuration of adding a dummy receiver and pipeline is required due to current limitations of the pipeline design and constraints. **Processor vs Exporter:** This component can be implemented both as a _processor_ or _exporter_. The _processor_ path was taken because: - From the [Opentelemetry Collector Architecture document](https://github.com/open-telemetry/opentelemetry-collector/blob/master/docs/design.md#processors), "processors" **transform** data and "they can also generate **new data**". Thus, a processor seems to be more fitting for something that aggregates span data into metrics than an exporter. - Processors allow the flexibility of writing to a metrics exporter without the overhead of an additional pipeline, if no processing of metrics is required. An example is exporting span metrics directly to prometheus. - `host:port` configuration is not required as it is encapsulated in the configured exporter to write span metrics to. - If there is a need to process span metrics further, an additional pipeline can be introduced. The advantages with an _exporter_ approach are: - It is more explicit, and does not need dummy receivers. It's very clear what the pipeline is doing just from reading the configuration. - There are discussions on #403 on a proposal to introduce translators, which are essentially both exporters and receivers. If the spanmetricsprocessor adopts this approach, it should be an easier and more natural transition from exporters to translators. **Link to tracking Issue:** #403 **Testing:** - Unit tests. **Documentation:** - README file. - Sample config in `testdata` directory. - Comments in Config struct.
open-telemetry · Jan 8, 2021 · b2df904 · b2df904
1 parent 821bdc5
commit b2df904
Show file tree

Hide file tree

Showing 17 changed files with 2,410 additions and 0 deletions.
diff --git a/processor/spanmetricsprocessor/Makefile b/processor/spanmetricsprocessor/Makefile
@@ -0,0 +1 @@
+include ../../Makefile.Common
diff --git a/processor/spanmetricsprocessor/README.md b/processor/spanmetricsprocessor/README.md
@@ -0,0 +1,106 @@
+# Span Metrics Processor
+
+Aggregates Request, Error and Duration (R.E.D) metrics from span data.
+
+**Request** counts are computed as the number of spans seen per unique set of dimensions, including Errors.
+For example, the following metric shows 142 calls:
+```
+promexample_calls{http_method="GET",http_status_code="200",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET"} 142
+```
+Multiple metrics can be aggregated if, for instance, a user wishes to view call counts just on `service_name` and `operation`.
+
+**Error** counts are computed from the Request counts which have an "Error" Status Code metric dimension.
+For example, the following metric indicates 220 errors:
+```
+promexample_calls{http_method="GET",http_status_code="503",operation="/checkout",service_name="frontend",span_kind="SPAN_KIND_CLIENT",status_code="STATUS_CODE_ERROR"} 220
+```
+
+**Duration** is computed from the difference between the span start and end times and inserted into the
+relevant latency histogram time bucket for each unique set dimensions.
+For example, the following latency buckets indicate the vast majority of spans (9K) have a 100ms latency:
+```
+promexample_latency_bucket{http_method="GET",http_status_code="200",label1="value1",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="2"} 327
+promexample_latency_bucket{http_method="GET",http_status_code="200",label1="value1",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="6"} 751
+promexample_latency_bucket{http_method="GET",http_status_code="200",label1="value1",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="10"} 1195
+promexample_latency_bucket{http_method="GET",http_status_code="200",label1="value1",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="100"} 10180
+promexample_latency_bucket{http_method="GET",http_status_code="200",label1="value1",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="250"} 10180
+...
+```
+
+Each metric will have _at least_ the following dimensions because they are common across all spans:
+- Service name
+- Operation
+- Span kind
+- Status code
+
+This processor lets traces to continue through the pipeline unmodified.
+
+The following settings are required:
+
+- `metrics_exporter`: the name of the exporter that this processor will write metrics to. This exporter **must** be present in a pipeline.
+
+The following settings can be optionally configured:
+
+- `latency_histogram_buckets`: the list of durations defining the latency histogram buckets.
+  - Default: `[2ms, 4ms, 6ms, 8ms, 10ms, 50ms, 100ms, 200ms, 400ms, 800ms, 1s, 1400ms, 2s, 5s, 10s, 15s]`
+- `dimensions`: the list of dimensions to add together with the default dimensions defined above. Each additional dimension is defined with a `name` which is looked up in the span's collection of attributes. If the `name`d attribute is missing in the span, the optional provided `default` is used. If no `default` is provided, this dimension will be **omitted** from the metric.
+
+Example:
+
+```yaml
+receivers:
+  jaeger:
+    protocols:
+      thrift_http:
+        endpoint: "0.0.0.0:14278"
+
+  # Dummy receiver that's never used, because a pipeline is required to have one.
+  otlp/spanmetrics:
+    protocols:
+      grpc:
+        endpoint: "localhost:12345"
+
+  otlp:
+    protocols:
+      grpc:
+        endpoint: "localhost:55677"
+
+processors:
+  spanmetrics:
+    metrics_exporter: otlp/spanmetrics
+    latency_histogram_buckets: [2ms, 6ms, 10ms, 100ms, 250ms]
+    dimensions:
+      - name: http.method
+        default: GET
+      - name: http.status_code
+
+exporters:
+  jaeger:
+    endpoint: localhost:14250
+
+  otlp/spanmetrics:
+    endpoint: "localhost:55677"
+    insecure: true
+
+  prometheus:
+    endpoint: "0.0.0.0:8889"
+    namespace: promexample
+
+pipelines:
+  traces:
+    receivers: [jaeger]
+    processors: [spanmetrics, batch, queued_retry]
+    exporters: [jaeger]
+
+  # The exporter name must match the metrics_exporter name.
+  # The receiver is just a dummy and never used; added to pass validation requiring at least one receiver in a pipeline.
+  metrics/spanmetrics:
+    receivers: [otlp/spanmetrics]
+    exporters: [otlp/spanmetrics]
+
+  metrics:
+    receivers: [otlp]
+    exporters: [prometheus]
+```
+
+The full list of settings exposed for this processor are documented [here](./config.go) with detailed sample configuration [here](./testdata).
diff --git a/processor/spanmetricsprocessor/config.go b/processor/spanmetricsprocessor/config.go
@@ -0,0 +1,46 @@
+// Copyright The OpenTelemetry Authors
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//       http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package spanmetricsprocessor
+
+import (
+	"time"
+
+	"go.opentelemetry.io/collector/config/configmodels"
+)
+
+type Dimension struct {
+	Name    string  `mapstructure:"name"`
+	Default *string `mapstructure:"default"`
+}
+
+type Config struct {
+	configmodels.ProcessorSettings `mapstructure:",squash"`
+
+	// MetricsExporter is the name of the metrics exporter to use to ship metrics.
+	MetricsExporter string `mapstructure:"metrics_exporter"`
+
+	// LatencyHistogramBuckets is the list of durations representing latency histogram buckets.
+	// See defaultLatencyHistogramBucketsMs in processor.go for the default value.
+	LatencyHistogramBuckets []time.Duration `mapstructure:"latency_histogram_buckets"`
+
+	// Dimensions defines the list of additional dimensions on top of the provided:
+	// - service.name
+	// - operation
+	// - span.kind
+	// - status.code
+	// The dimensions will be fetched from the span's attributes. Examples of some conventionally used attributes:
+	// https://github.com/open-telemetry/opentelemetry-collector/blob/master/translator/conventions/opentelemetry.go.
+	Dimensions []Dimension `mapstructure:"dimensions"`
+}
diff --git a/processor/spanmetricsprocessor/config_test.go b/processor/spanmetricsprocessor/config_test.go
@@ -0,0 +1,99 @@
+// Copyright The OpenTelemetry Authors
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//       http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package spanmetricsprocessor
+
+import (
+	"path"
+	"testing"
+	"time"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+	"go.opentelemetry.io/collector/component/componenttest"
+	"go.opentelemetry.io/collector/config/configmodels"
+	"go.opentelemetry.io/collector/config/configtest"
+	"go.opentelemetry.io/collector/exporter/jaegerexporter"
+	"go.opentelemetry.io/collector/exporter/otlpexporter"
+	"go.opentelemetry.io/collector/exporter/prometheusexporter"
+	"go.opentelemetry.io/collector/processor/batchprocessor"
+	"go.opentelemetry.io/collector/processor/queuedprocessor"
+	"go.opentelemetry.io/collector/receiver/jaegerreceiver"
+	"go.opentelemetry.io/collector/receiver/otlpreceiver"
+)
+
+func TestLoadConfig(t *testing.T) {
+	defaultMethod := "GET"
+	testcases := []struct {
+		configFile                  string
+		wantMetricsExporter         string
+		wantLatencyHistogramBuckets []time.Duration
+		wantDimensions              []Dimension
+	}{
+		{configFile: "config-2-pipelines.yaml", wantMetricsExporter: "prometheus"},
+		{configFile: "config-3-pipelines.yaml", wantMetricsExporter: "otlp/spanmetrics"},
+		{
+			configFile:          "config-full.yaml",
+			wantMetricsExporter: "otlp/spanmetrics",
+			wantLatencyHistogramBuckets: []time.Duration{
+				2 * time.Millisecond,
+				6 * time.Millisecond,
+				10 * time.Millisecond,
+				100 * time.Millisecond,
+				250 * time.Millisecond,
+			},
+			wantDimensions: []Dimension{
+				{"http.method", &defaultMethod},
+				{"http.status_code", nil},
+			},
+		},
+	}
+	for _, tc := range testcases {
+		t.Run(tc.configFile, func(t *testing.T) {
+			// Prepare
+			factories, err := componenttest.ExampleComponents()
+			require.NoError(t, err)
+
+			factories.Receivers["otlp"] = otlpreceiver.NewFactory()
+			factories.Receivers["jaeger"] = jaegerreceiver.NewFactory()
+
+			factories.Processors[typeStr] = NewFactory()
+			factories.Processors["batch"] = batchprocessor.NewFactory()
+			factories.Processors["queued_retry"] = queuedprocessor.NewFactory()
+
+			factories.Exporters["otlp"] = otlpexporter.NewFactory()
+			factories.Exporters["prometheus"] = prometheusexporter.NewFactory()
+			factories.Exporters["jaeger"] = jaegerexporter.NewFactory()
+
+			// Test
+			cfg, err := configtest.LoadConfigFile(t, path.Join(".", "testdata", tc.configFile), factories)
+
+			// Verify
+			require.NoError(t, err)
+			require.NotNil(t, cfg)
+			assert.Equal(t,
+				&Config{
+					ProcessorSettings: configmodels.ProcessorSettings{
+						NameVal: "spanmetrics",
+						TypeVal: "spanmetrics",
+					},
+					MetricsExporter:         tc.wantMetricsExporter,
+					LatencyHistogramBuckets: tc.wantLatencyHistogramBuckets,
+					Dimensions:              tc.wantDimensions,
+				},
+				cfg.Processors["spanmetrics"],
+			)
+		})
+	}
+}
diff --git a/processor/spanmetricsprocessor/factory.go b/processor/spanmetricsprocessor/factory.go
@@ -0,0 +1,51 @@
+// Copyright The OpenTelemetry Authors
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//       http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package spanmetricsprocessor
+
+import (
+	"context"
+
+	"go.opentelemetry.io/collector/component"
+	"go.opentelemetry.io/collector/config/configmodels"
+	"go.opentelemetry.io/collector/consumer"
+	"go.opentelemetry.io/collector/processor/processorhelper"
+)
+
+const (
+	// The value of "type" key in configuration.
+	typeStr = "spanmetrics"
+)
+
+// NewFactory creates a factory for the spanmetrics processor.
+func NewFactory() component.ProcessorFactory {
+	return processorhelper.NewFactory(
+		typeStr,
+		createDefaultConfig,
+		processorhelper.WithTraces(createTraceProcessor),
+	)
+}
+
+func createDefaultConfig() configmodels.Processor {
+	return &Config{
+		ProcessorSettings: configmodels.ProcessorSettings{
+			TypeVal: typeStr,
+			NameVal: typeStr,
+		},
+	}
+}
+
+func createTraceProcessor(_ context.Context, params component.ProcessorCreateParams, cfg configmodels.Processor, nextConsumer consumer.TracesConsumer) (component.TracesProcessor, error) {
+	return newProcessor(params.Logger, cfg, nextConsumer)
+}
diff --git a/processor/spanmetricsprocessor/factory_test.go b/processor/spanmetricsprocessor/factory_test.go
@@ -0,0 +1,82 @@
+// Copyright The OpenTelemetry Authors
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//       http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package spanmetricsprocessor
+
+import (
+	"context"
+	"testing"
+	"time"
+
+	"github.com/stretchr/testify/assert"
+	"go.opentelemetry.io/collector/component"
+	"go.opentelemetry.io/collector/consumer/consumertest"
+	"go.uber.org/zap"
+)
+
+func TestNewProcessor(t *testing.T) {
+	defaultMethod := "GET"
+	for _, tc := range []struct {
+		name                        string
+		metricsExporter             string
+		latencyHistogramBuckets     []time.Duration
+		dimensions                  []Dimension
+		wantLatencyHistogramBuckets []float64
+		wantDimensions              []Dimension
+	}{
+		{
+			name:                        "simplest config (use defaults)",
+			wantLatencyHistogramBuckets: defaultLatencyHistogramBucketsMs,
+		},
+		{
+			name:                        "latency histogram configured with catch-all bucket to check no additional catch-all bucket inserted",
+			latencyHistogramBuckets:     []time.Duration{2 * time.Millisecond, maxDuration},
+			wantLatencyHistogramBuckets: []float64{2, maxDurationMs},
+		},
+		{
+			name:                    "full config with no catch-all bucket and check the catch-all bucket is inserted",
+			latencyHistogramBuckets: []time.Duration{2 * time.Millisecond},
+			dimensions: []Dimension{
+				{"http.method", &defaultMethod},
+				{"http.status_code", nil},
+			},
+			wantLatencyHistogramBuckets: []float64{2, maxDurationMs},
+			wantDimensions: []Dimension{
+				{"http.method", &defaultMethod},
+				{"http.status_code", nil},
+			},
+		},
+	} {
+		t.Run(tc.name, func(t *testing.T) {
+			// Prepare
+			factory := NewFactory()
+
+			creationParams := component.ProcessorCreateParams{Logger: zap.NewNop()}
+			cfg := factory.CreateDefaultConfig().(*Config)
+			cfg.LatencyHistogramBuckets = tc.latencyHistogramBuckets
+			cfg.Dimensions = tc.dimensions
+
+			// Test
+			traceProcessor, err := factory.CreateTracesProcessor(context.Background(), creationParams, cfg, consumertest.NewTracesNop())
+			smp := traceProcessor.(*processorImp)
+
+			// Verify
+			assert.Nil(t, err)
+			assert.NotNil(t, smp)
+
+			assert.Equal(t, tc.wantLatencyHistogramBuckets, smp.latencyBounds)
+			assert.Equal(t, tc.wantDimensions, smp.dimensions)
+		})
+	}
+}