Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(telemetry): selectively transmit/redact subgraph ftv1 error messages to studio #3011

Merged
merged 9 commits into from
May 3, 2023
36 changes: 36 additions & 0 deletions .changesets/feat_bnjjj_add_apollo_redact_errors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
### Add ability to transmit un-redacted errors from federated traces to Apollo Studio

When using subgraphs which are enabled with [Apollo Federated Tracing](https://www.apollographql.com/docs/router/configuration/apollo-telemetry/#enabling-field-level-instrumentation), the error messages within those traces will be **redacted by default**.

New configuration (`tracing.apollo.errors.subgraph.all.redact`, which defaults to `true`) enables or disables the redaction mechanism. Similar configuration (`tracing.apollo.errors.subgraph.all.send`, which also defaults to `true`) enables or disables the entire transmission of the error to Studio.

The error messages returned to the clients are **not** changed or redacted from their previous behavior.

To enable sending subgraph's federated trace error messages to Studio **without redaction**, you can set the following configuration:

```yaml title="router.yaml"
telemetry:
apollo:
errors:
subgraph:
all:
send: true # (true = Send to Studio, false = Do not send; default: true)
redact: false # (true = Redact full error message, false = Do not redact; default: true)
```

It is also possible to configure this **per-subgraph** using a `subgraphs` map at the same level as `all` in the configuration, much like other sections of the configuration which have subgraph-specific capabilities:

```yaml title="router.yaml"
telemetry:
apollo:
errors:
subgraph:
all:
send: true
redact: false # Disable redaction as a default. The `accounts` service enables it below.
subgraphs:
accounts: # Applies to the `accounts` subgraph, overriding the `all` global setting.
redact: true # Redact messages from the `accounts` service.
```

By [@bnjjj](https://github.com/bnjjj) in https://github.com/apollographql/router/pull/3011
Original file line number Diff line number Diff line change
Expand Up @@ -1354,8 +1354,60 @@ expression: "&schema"
"default": "https://usage-reporting.api.apollographql.com/api/ingress/traces",
"type": "string"
},
"errors": {
"description": "Configure the way errors are transmitted to Apollo Studio",
"type": "object",
"properties": {
"subgraph": {
"description": "Handling of errors coming from subgraph",
"type": "object",
"properties": {
"all": {
"description": "Handling of errors coming from all subgraphs",
"type": "object",
"properties": {
"redact": {
"description": "Redact subgraph errors to Apollo Studio",
"default": true,
"type": "boolean"
},
"send": {
"description": "Send subgraph errors to Apollo Studio",
"default": true,
"type": "boolean"
}
},
"additionalProperties": false
},
"subgraphs": {
"description": "Handling of errors coming from specified subgraphs",
"type": "object",
"additionalProperties": {
"type": "object",
"properties": {
"redact": {
"description": "Redact subgraph errors to Apollo Studio",
"default": true,
"type": "boolean"
},
"send": {
"description": "Send subgraph errors to Apollo Studio",
"default": true,
"type": "boolean"
}
},
"additionalProperties": false
},
"nullable": true
}
},
"additionalProperties": false
}
},
"additionalProperties": false
},
"field_level_instrumentation_sampler": {
"description": "Enable field level instrumentation for subgraphs via ftv1. ftv1 tracing can cause performance issues as it is transmitted in band with subgraph responses. 0.0 will result in no field level instrumentation. 1.0 will result in always instrumentation. Value MUST be less than global sampling rate",
"description": "Field level instrumentation for subgraphs via ftv1. ftv1 tracing can cause performance issues as it is transmitted in band with subgraph responses.",
"anyOf": [
{
"description": "Sample a given fraction. Fractions >= 1 will always sample.",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
source: apollo-router/src/configuration/tests.rs
expression: new_config
---
---
telemetry:
apollo:
field_level_instrumentation:
sampler: always_off

62 changes: 58 additions & 4 deletions apollo-router/src/plugins/telemetry/apollo.rs
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,7 @@ pub(crate) struct Config {
/// The buffer size for sending traces to Apollo. Increase this if you are experiencing lost traces.
pub(crate) buffer_size: NonZeroUsize,

/// Enable field level instrumentation for subgraphs via ftv1. ftv1 tracing can cause performance issues as it is transmitted in band with subgraph responses.
/// 0.0 will result in no field level instrumentation. 1.0 will result in always instrumentation.
/// Value MUST be less than global sampling rate
/// Field level instrumentation for subgraphs via ftv1. ftv1 tracing can cause performance issues as it is transmitted in band with subgraph responses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the motivation for this change in relation to this PR?

pub(crate) field_level_instrumentation_sampler: SamplerOption,

/// To configure which request header names and values are included in trace data that's sent to Apollo Studio.
Expand All @@ -83,9 +81,64 @@ pub(crate) struct Config {

/// Configuration for batch processing.
pub(crate) batch_processor: BatchProcessorConfig,

/// Configure the way errors are transmitted to Apollo Studio
pub(crate) errors: ErrorsConfiguration,
}

#[derive(Debug, Clone, Deserialize, JsonSchema, Default)]
#[serde(deny_unknown_fields, default)]
pub(crate) struct ErrorsConfiguration {
/// Handling of errors coming from subgraph
pub(crate) subgraph: SubgraphErrorConfig,
}

#[derive(Debug, Clone, Deserialize, JsonSchema, Default)]
#[serde(deny_unknown_fields, default)]
pub(crate) struct SubgraphErrorConfig {
/// Handling of errors coming from all subgraphs
pub(crate) all: ErrorConfiguration,
/// Handling of errors coming from specified subgraphs
pub(crate) subgraphs: Option<HashMap<String, ErrorConfiguration>>,
}

#[derive(Debug, Clone, Deserialize, JsonSchema)]
#[serde(deny_unknown_fields, default)]
pub(crate) struct ErrorConfiguration {
/// Send subgraph errors to Apollo Studio
pub(crate) send: bool,
/// Redact subgraph errors to Apollo Studio
pub(crate) redact: bool,
}

impl Default for ErrorConfiguration {
fn default() -> Self {
Self {
send: default_send_errors(),
redact: default_redact_errors(),
}
}
}

impl SubgraphErrorConfig {
pub(crate) fn get_error_config(&self, subgraph: &str) -> &ErrorConfiguration {
if let Some(subgraph_conf) = self.subgraphs.as_ref().and_then(|s| s.get(subgraph)) {
subgraph_conf
} else {
&self.all
}
}
}

pub(crate) const fn default_send_errors() -> bool {
true
}

pub(crate) const fn default_redact_errors() -> bool {
true
}

fn default_field_level_instrumentation_sampler() -> SamplerOption {
const fn default_field_level_instrumentation_sampler() -> SamplerOption {
SamplerOption::TraceIdRatioBased(0.01)
}

Expand Down Expand Up @@ -127,6 +180,7 @@ impl Default for Config {
send_headers: ForwardHeaders::None,
send_variable_values: ForwardValues::None,
batch_processor: BatchProcessorConfig::default(),
errors: ErrorsConfiguration::default(),
}
}
}
Expand Down
2 changes: 2 additions & 0 deletions apollo-router/src/plugins/telemetry/tracing/apollo.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ impl TracingConfigurator for Config {
buffer_size,
field_level_instrumentation_sampler,
batch_processor,
errors,
..
} => {
tracing::debug!("configuring exporter to Studio");
Expand All @@ -35,6 +36,7 @@ impl TracingConfigurator for Config {
.buffer_size(*buffer_size)
.field_execution_sampler(field_level_instrumentation_sampler.clone())
.batch_config(batch_processor.clone())
.errors_configuration(errors.clone())
.build()?;
builder.with_span_processor(
BatchSpanProcessor::builder(exporter, opentelemetry::runtime::Tokio)
Expand Down
Loading