Skip to content

Commit

Permalink
Add spanmetricsprocessor readme, config, sample config, factory and t…
Browse files Browse the repository at this point in the history
…ests (#1917)

Signed-off-by: albertteoh <albert.teoh@logz.io>

Adds a new processor for aggregating span data to produce Requests Error and Duration (R.E.D) metrics.

This is the first PR with a few to follow; containing the structure of the processor including:
- README
- Config struct
- Factory
- Processor skeleton code
- Tests

Note: as advised by maintainers during the Agent/Collector SIG, a workaround configuration of adding a dummy receiver and pipeline is required due to current limitations of the pipeline design and constraints.

**Processor vs Exporter:**
This component can be implemented both as a _processor_ or _exporter_. The _processor_ path was taken because:
- From the [Opentelemetry Collector Architecture document](https://github.com/open-telemetry/opentelemetry-collector/blob/master/docs/design.md#processors), "processors" **transform** data and "they can also generate **new data**". Thus, a processor seems to be more fitting for something that aggregates span data into metrics than an exporter.
- Processors allow the flexibility of writing to a metrics exporter without the overhead of an additional pipeline, if no processing of metrics is required. An example is exporting span metrics directly to prometheus.
- `host:port` configuration is not required as it is encapsulated in the configured exporter to write span metrics to.
- If there is a need to process span metrics further, an additional pipeline can be introduced.

The advantages with an _exporter_ approach are:
- It is more explicit, and does not need dummy receivers. It's very clear what the pipeline is doing just from reading the configuration.
- There are discussions on #403 on a proposal to introduce translators, which are essentially both exporters and receivers. If the spanmetricsprocessor adopts this approach, it should be an easier and more natural transition from exporters to translators.

**Link to tracking Issue:** #403

**Testing:** 
- Unit tests.

**Documentation:**
- README file.
- Sample config in `testdata` directory.
- Comments in Config struct.
  • Loading branch information
albertteoh authored Jan 8, 2021
1 parent 821bdc5 commit b2df904
Show file tree
Hide file tree
Showing 17 changed files with 2,410 additions and 0 deletions.
1 change: 1 addition & 0 deletions processor/spanmetricsprocessor/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
include ../../Makefile.Common
106 changes: 106 additions & 0 deletions processor/spanmetricsprocessor/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Span Metrics Processor

Aggregates Request, Error and Duration (R.E.D) metrics from span data.

**Request** counts are computed as the number of spans seen per unique set of dimensions, including Errors.
For example, the following metric shows 142 calls:
```
promexample_calls{http_method="GET",http_status_code="200",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET"} 142
```
Multiple metrics can be aggregated if, for instance, a user wishes to view call counts just on `service_name` and `operation`.

**Error** counts are computed from the Request counts which have an "Error" Status Code metric dimension.
For example, the following metric indicates 220 errors:
```
promexample_calls{http_method="GET",http_status_code="503",operation="/checkout",service_name="frontend",span_kind="SPAN_KIND_CLIENT",status_code="STATUS_CODE_ERROR"} 220
```

**Duration** is computed from the difference between the span start and end times and inserted into the
relevant latency histogram time bucket for each unique set dimensions.
For example, the following latency buckets indicate the vast majority of spans (9K) have a 100ms latency:
```
promexample_latency_bucket{http_method="GET",http_status_code="200",label1="value1",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="2"} 327
promexample_latency_bucket{http_method="GET",http_status_code="200",label1="value1",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="6"} 751
promexample_latency_bucket{http_method="GET",http_status_code="200",label1="value1",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="10"} 1195
promexample_latency_bucket{http_method="GET",http_status_code="200",label1="value1",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="100"} 10180
promexample_latency_bucket{http_method="GET",http_status_code="200",label1="value1",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="250"} 10180
...
```

Each metric will have _at least_ the following dimensions because they are common across all spans:
- Service name
- Operation
- Span kind
- Status code

This processor lets traces to continue through the pipeline unmodified.

The following settings are required:

- `metrics_exporter`: the name of the exporter that this processor will write metrics to. This exporter **must** be present in a pipeline.

The following settings can be optionally configured:

- `latency_histogram_buckets`: the list of durations defining the latency histogram buckets.
- Default: `[2ms, 4ms, 6ms, 8ms, 10ms, 50ms, 100ms, 200ms, 400ms, 800ms, 1s, 1400ms, 2s, 5s, 10s, 15s]`
- `dimensions`: the list of dimensions to add together with the default dimensions defined above. Each additional dimension is defined with a `name` which is looked up in the span's collection of attributes. If the `name`d attribute is missing in the span, the optional provided `default` is used. If no `default` is provided, this dimension will be **omitted** from the metric.

Example:

```yaml
receivers:
jaeger:
protocols:
thrift_http:
endpoint: "0.0.0.0:14278"

# Dummy receiver that's never used, because a pipeline is required to have one.
otlp/spanmetrics:
protocols:
grpc:
endpoint: "localhost:12345"

otlp:
protocols:
grpc:
endpoint: "localhost:55677"

processors:
spanmetrics:
metrics_exporter: otlp/spanmetrics
latency_histogram_buckets: [2ms, 6ms, 10ms, 100ms, 250ms]
dimensions:
- name: http.method
default: GET
- name: http.status_code

exporters:
jaeger:
endpoint: localhost:14250

otlp/spanmetrics:
endpoint: "localhost:55677"
insecure: true

prometheus:
endpoint: "0.0.0.0:8889"
namespace: promexample

pipelines:
traces:
receivers: [jaeger]
processors: [spanmetrics, batch, queued_retry]
exporters: [jaeger]

# The exporter name must match the metrics_exporter name.
# The receiver is just a dummy and never used; added to pass validation requiring at least one receiver in a pipeline.
metrics/spanmetrics:
receivers: [otlp/spanmetrics]
exporters: [otlp/spanmetrics]

metrics:
receivers: [otlp]
exporters: [prometheus]
```
The full list of settings exposed for this processor are documented [here](./config.go) with detailed sample configuration [here](./testdata).
46 changes: 46 additions & 0 deletions processor/spanmetricsprocessor/config.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
// Copyright The OpenTelemetry Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package spanmetricsprocessor

import (
"time"

"go.opentelemetry.io/collector/config/configmodels"
)

type Dimension struct {
Name string `mapstructure:"name"`
Default *string `mapstructure:"default"`
}

type Config struct {
configmodels.ProcessorSettings `mapstructure:",squash"`

// MetricsExporter is the name of the metrics exporter to use to ship metrics.
MetricsExporter string `mapstructure:"metrics_exporter"`

// LatencyHistogramBuckets is the list of durations representing latency histogram buckets.
// See defaultLatencyHistogramBucketsMs in processor.go for the default value.
LatencyHistogramBuckets []time.Duration `mapstructure:"latency_histogram_buckets"`

// Dimensions defines the list of additional dimensions on top of the provided:
// - service.name
// - operation
// - span.kind
// - status.code
// The dimensions will be fetched from the span's attributes. Examples of some conventionally used attributes:
// https://github.com/open-telemetry/opentelemetry-collector/blob/master/translator/conventions/opentelemetry.go.
Dimensions []Dimension `mapstructure:"dimensions"`
}
99 changes: 99 additions & 0 deletions processor/spanmetricsprocessor/config_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
// Copyright The OpenTelemetry Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package spanmetricsprocessor

import (
"path"
"testing"
"time"

"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"go.opentelemetry.io/collector/component/componenttest"
"go.opentelemetry.io/collector/config/configmodels"
"go.opentelemetry.io/collector/config/configtest"
"go.opentelemetry.io/collector/exporter/jaegerexporter"
"go.opentelemetry.io/collector/exporter/otlpexporter"
"go.opentelemetry.io/collector/exporter/prometheusexporter"
"go.opentelemetry.io/collector/processor/batchprocessor"
"go.opentelemetry.io/collector/processor/queuedprocessor"
"go.opentelemetry.io/collector/receiver/jaegerreceiver"
"go.opentelemetry.io/collector/receiver/otlpreceiver"
)

func TestLoadConfig(t *testing.T) {
defaultMethod := "GET"
testcases := []struct {
configFile string
wantMetricsExporter string
wantLatencyHistogramBuckets []time.Duration
wantDimensions []Dimension
}{
{configFile: "config-2-pipelines.yaml", wantMetricsExporter: "prometheus"},
{configFile: "config-3-pipelines.yaml", wantMetricsExporter: "otlp/spanmetrics"},
{
configFile: "config-full.yaml",
wantMetricsExporter: "otlp/spanmetrics",
wantLatencyHistogramBuckets: []time.Duration{
2 * time.Millisecond,
6 * time.Millisecond,
10 * time.Millisecond,
100 * time.Millisecond,
250 * time.Millisecond,
},
wantDimensions: []Dimension{
{"http.method", &defaultMethod},
{"http.status_code", nil},
},
},
}
for _, tc := range testcases {
t.Run(tc.configFile, func(t *testing.T) {
// Prepare
factories, err := componenttest.ExampleComponents()
require.NoError(t, err)

factories.Receivers["otlp"] = otlpreceiver.NewFactory()
factories.Receivers["jaeger"] = jaegerreceiver.NewFactory()

factories.Processors[typeStr] = NewFactory()
factories.Processors["batch"] = batchprocessor.NewFactory()
factories.Processors["queued_retry"] = queuedprocessor.NewFactory()

factories.Exporters["otlp"] = otlpexporter.NewFactory()
factories.Exporters["prometheus"] = prometheusexporter.NewFactory()
factories.Exporters["jaeger"] = jaegerexporter.NewFactory()

// Test
cfg, err := configtest.LoadConfigFile(t, path.Join(".", "testdata", tc.configFile), factories)

// Verify
require.NoError(t, err)
require.NotNil(t, cfg)
assert.Equal(t,
&Config{
ProcessorSettings: configmodels.ProcessorSettings{
NameVal: "spanmetrics",
TypeVal: "spanmetrics",
},
MetricsExporter: tc.wantMetricsExporter,
LatencyHistogramBuckets: tc.wantLatencyHistogramBuckets,
Dimensions: tc.wantDimensions,
},
cfg.Processors["spanmetrics"],
)
})
}
}
51 changes: 51 additions & 0 deletions processor/spanmetricsprocessor/factory.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
// Copyright The OpenTelemetry Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package spanmetricsprocessor

import (
"context"

"go.opentelemetry.io/collector/component"
"go.opentelemetry.io/collector/config/configmodels"
"go.opentelemetry.io/collector/consumer"
"go.opentelemetry.io/collector/processor/processorhelper"
)

const (
// The value of "type" key in configuration.
typeStr = "spanmetrics"
)

// NewFactory creates a factory for the spanmetrics processor.
func NewFactory() component.ProcessorFactory {
return processorhelper.NewFactory(
typeStr,
createDefaultConfig,
processorhelper.WithTraces(createTraceProcessor),
)
}

func createDefaultConfig() configmodels.Processor {
return &Config{
ProcessorSettings: configmodels.ProcessorSettings{
TypeVal: typeStr,
NameVal: typeStr,
},
}
}

func createTraceProcessor(_ context.Context, params component.ProcessorCreateParams, cfg configmodels.Processor, nextConsumer consumer.TracesConsumer) (component.TracesProcessor, error) {
return newProcessor(params.Logger, cfg, nextConsumer)
}
82 changes: 82 additions & 0 deletions processor/spanmetricsprocessor/factory_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
// Copyright The OpenTelemetry Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package spanmetricsprocessor

import (
"context"
"testing"
"time"

"github.com/stretchr/testify/assert"
"go.opentelemetry.io/collector/component"
"go.opentelemetry.io/collector/consumer/consumertest"
"go.uber.org/zap"
)

func TestNewProcessor(t *testing.T) {
defaultMethod := "GET"
for _, tc := range []struct {
name string
metricsExporter string
latencyHistogramBuckets []time.Duration
dimensions []Dimension
wantLatencyHistogramBuckets []float64
wantDimensions []Dimension
}{
{
name: "simplest config (use defaults)",
wantLatencyHistogramBuckets: defaultLatencyHistogramBucketsMs,
},
{
name: "latency histogram configured with catch-all bucket to check no additional catch-all bucket inserted",
latencyHistogramBuckets: []time.Duration{2 * time.Millisecond, maxDuration},
wantLatencyHistogramBuckets: []float64{2, maxDurationMs},
},
{
name: "full config with no catch-all bucket and check the catch-all bucket is inserted",
latencyHistogramBuckets: []time.Duration{2 * time.Millisecond},
dimensions: []Dimension{
{"http.method", &defaultMethod},
{"http.status_code", nil},
},
wantLatencyHistogramBuckets: []float64{2, maxDurationMs},
wantDimensions: []Dimension{
{"http.method", &defaultMethod},
{"http.status_code", nil},
},
},
} {
t.Run(tc.name, func(t *testing.T) {
// Prepare
factory := NewFactory()

creationParams := component.ProcessorCreateParams{Logger: zap.NewNop()}
cfg := factory.CreateDefaultConfig().(*Config)
cfg.LatencyHistogramBuckets = tc.latencyHistogramBuckets
cfg.Dimensions = tc.dimensions

// Test
traceProcessor, err := factory.CreateTracesProcessor(context.Background(), creationParams, cfg, consumertest.NewTracesNop())
smp := traceProcessor.(*processorImp)

// Verify
assert.Nil(t, err)
assert.NotNil(t, smp)

assert.Equal(t, tc.wantLatencyHistogramBuckets, smp.latencyBounds)
assert.Equal(t, tc.wantDimensions, smp.dimensions)
})
}
}
Loading

0 comments on commit b2df904

Please sign in to comment.