Skip to content

Commit

Permalink
Strip number suffix from instance name to consolidate services that t…
Browse files Browse the repository at this point in the history
…races are spread over (#13729)

The problem with many services is that it makes it hard to find which service has the trace you want, see jaegertracing/jaeger-ui#985

Previously, we split traces out into services based on their instance name like `matrix.org client_reader-1`, etc but there are many worker instances of the same `client_reader` so there is a lot to click through.

With this PR, all of the traces are just collected under the worker type like `client_reader`, `event_persister` 😇

Note: A Synapse worker instance name is an opaque string with the number convention only being our own thing for the `matrix.org` deployment. But seems pretty sensible to group things this way.
  • Loading branch information
MadLittleMods authored and realtyem committed Sep 10, 2022
1 parent 9072eb0 commit b5c4a72
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 1 deletion.
1 change: 1 addition & 0 deletions changelog.d/13729.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Strip number suffix from instance name to consolidate services that traces are spread over.
13 changes: 12 additions & 1 deletion synapse/logging/opentracing.py
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,9 @@ def set_fates(clotho, lachesis, atropos, father="Zues", mother="Themis"):

# Helper class

# Matches the number suffix in an instance name like "matrix.org client_reader-8"
STRIP_INSTANCE_NUMBER_SUFFIX_REGEX = re.compile(r"[_-]?\d+$")


class _DummyTagNames:
"""wrapper of opentracings tags. We need to have them if we
Expand Down Expand Up @@ -441,9 +444,17 @@ def init_tracer(hs: "HomeServer") -> None:

from jaeger_client.metrics.prometheus import PrometheusMetricsFactory

# Instance names are opaque strings but by stripping off the number suffix,
# we can get something that looks like a "worker type", e.g.
# "client_reader-1" -> "client_reader" so we don't spread the traces across
# so many services.
instance_name_by_type = re.sub(
STRIP_INSTANCE_NUMBER_SUFFIX_REGEX, "", hs.get_instance_name()
)

config = JaegerConfig(
config=hs.config.tracing.jaeger_config,
service_name=f"{hs.config.server.server_name} {hs.get_instance_name()}",
service_name=f"{hs.config.server.server_name} {instance_name_by_type}",
scope_manager=LogContextScopeManager(),
metrics_factory=PrometheusMetricsFactory(),
)
Expand Down

0 comments on commit b5c4a72

Please sign in to comment.