discovery: report containerID for containerised services #29719

Yumasi · 2024-10-02T10:03:01Z

What does this PR do?

This PR makes the Discovery agent check add container IDs to discovered services. It relies on the workload metadata store to get the container IDs.

Motivation

USMON-1204

Describe how to test/QA your changes

Possible Drawbacks / Trade-offs

Additional Notes

pr-commenter · 2024-10-02T10:33:33Z

Test changes on VM

Use this command from test-infra-definitions to manually test this PR changes on a VM:

inv create-vm --pipeline-id=46275972 --os-family=ubuntu

Note: This applies to commit eab643e

cit-pr-commenter · 2024-10-07T11:11:56Z

Go Package Import Differences

Baseline: 839b8bb
Comparison: eab643e

binary

os

arch

change

system-probe

linux

amd64

+15, -0

+github.com/DataDog/datadog-agent/internal/third_party/kubernetes/pkg/kubelet/cri/remote/util
+github.com/DataDog/datadog-agent/pkg/process/util/containers
+github.com/DataDog/datadog-agent/pkg/util/containers/cri
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/containerd
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/cri
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/docker
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/ecsfargate
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/kubelet
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/provider
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/system
+github.com/DataDog/datadog-agent/pkg/util/trie
+github.com/containerd/cgroups/v3/cgroup2/stats
+k8s.io/cri-api/pkg/apis/runtime/v1
+k8s.io/cri-api/pkg/apis/runtime/v1alpha2

system-probe

linux

arm64

+15, -0

+github.com/DataDog/datadog-agent/internal/third_party/kubernetes/pkg/kubelet/cri/remote/util
+github.com/DataDog/datadog-agent/pkg/process/util/containers
+github.com/DataDog/datadog-agent/pkg/util/containers/cri
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/containerd
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/cri
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/docker
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/ecsfargate
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/kubelet
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/provider
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/system
+github.com/DataDog/datadog-agent/pkg/util/trie
+github.com/containerd/cgroups/v3/cgroup2/stats
+k8s.io/cri-api/pkg/apis/runtime/v1
+k8s.io/cri-api/pkg/apis/runtime/v1alpha2

cit-pr-commenter · 2024-10-07T11:59:26Z

Regression Detector

Regression Detector Results

Metrics dashboard
Target profiles
Run ID: 22282116-b77b-4014-aa26-a45a2eb36ab4

Baseline: 839b8bb
Comparison: eab643e

Regression Detector: ✅

Bounds Checks: ✅

Significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
✅	pycheck_lots_of_tags	% cpu utilization	-5.28	[-8.81, -1.74]	1	Logs

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	basic_py_check	% cpu utilization	+1.57	[-2.23, +5.36]	1	Logs
➖	uds_dogstatsd_to_api_cpu	% cpu utilization	+0.79	[+0.06, +1.53]	1	Logs
➖	idle_all_features	memory utilization	+0.24	[+0.16, +0.32]	1	Logs
➖	tcp_dd_logs_filter_exclude	ingress throughput	+0.00	[-0.01, +0.01]	1	Logs
➖	uds_dogstatsd_to_api	ingress throughput	-0.01	[-0.12, +0.09]	1	Logs
➖	tcp_syslog_to_blackhole	ingress throughput	-0.03	[-0.08, +0.02]	1	Logs
➖	otel_to_otel_logs	ingress throughput	-0.18	[-0.99, +0.63]	1	Logs
➖	file_tree	memory utilization	-0.32	[-0.44, -0.21]	1	Logs
➖	idle	memory utilization	-0.55	[-0.60, -0.51]	1	Logs
✅	pycheck_lots_of_tags	% cpu utilization	-5.28	[-8.81, -1.74]	1	Logs

Bounds Checks Passed

perf	experiment	bounds_check_name	replicates_passed	links
✅	idle	memory_usage	10/10

Explanation

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

vitkyrka

Do you think it would be possible to test this in the E2E test (can be done in a separate PR)?

pkg/collector/corechecks/servicediscovery/events.go

vitkyrka · 2024-10-09T15:51:37Z

pkg/collector/corechecks/servicediscovery/impl_linux.go

+	// validity.
+	// TODO: use/find a global constant for this delay, to keep in sync with
+	// the check delay if it were to change.
+	containers := li.containerProvider.GetPidToCid(1 * time.Minute)


Do we know what impact this has on the CPU load?
Also, can we avoid calling this if there are no new services?

vitkyrka · 2024-10-09T15:55:31Z

pkg/collector/corechecks/servicediscovery/impl_linux.go

@@ -100,6 +110,12 @@ func (li *linuxImpl) DiscoverServices() (*discoveredServices, error) {
 				li.ignoreProcs[pid] = true
 				continue
 			}
+
+			if id, ok := containers[pid]; ok {
+				svc.service.ContainerID = id


This only needs to be done when the service goes from potential -> alive in the loop which starts at line 80 right? If this is moved there and combined with the above suggestions to only call GetPidToCid when we need it it should reduce some unnecessary work with short-lived process.

Made a new method for handling the potential->alive case, and moved the addition of the container id in it.

pkg/collector/corechecks/servicediscovery/impl_linux_test.go

pkg/collector/corechecks/servicediscovery/events.go

- Only get container ids if there are previous potential services - Only add container id on confirmed new services - Add test for propagation of container id values - Revert unrelated formatting change made by auto-formatter

Yumasi added team/usm The USM team changelog/no-changelog labels Oct 2, 2024

Yumasi force-pushed the guillaume.pagnoux/USMON-1176-discovery-containers-2 branch 2 times, most recently from a733785 to 126179c Compare October 7, 2024 11:05

Yumasi added 3 commits October 8, 2024 17:56

discovery: add containerID field to SD structs

21e56f9

discovery: provide workloadmeta store to discovery check

3a2c349

discovery: add containerID information to discovered services

1f3f296

Yumasi force-pushed the guillaume.pagnoux/USMON-1176-discovery-containers-2 branch 3 times, most recently from ce8cf0d to ee4e280 Compare October 9, 2024 08:16

test: add containerID checks to test

57ebe91

Yumasi force-pushed the guillaume.pagnoux/USMON-1176-discovery-containers-2 branch from ee4e280 to 57ebe91 Compare October 9, 2024 08:47

Yumasi changed the title ~~discovery: add containerID field to SD structs~~ discovery: report containerID for containerised services Oct 9, 2024

Yumasi marked this pull request as ready for review October 9, 2024 09:26

Yumasi requested review from a team as code owners October 9, 2024 09:26

vickenty approved these changes Oct 9, 2024

View reviewed changes

vitkyrka reviewed Oct 9, 2024

View reviewed changes

fix CR

eab643e

- Only get container ids if there are previous potential services - Only add container id on confirmed new services - Add test for propagation of container id values - Revert unrelated formatting change made by auto-formatter

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

discovery: report containerID for containerised services #29719

discovery: report containerID for containerised services #29719

Yumasi commented Oct 2, 2024 •

edited by jira bot

Loading

pr-commenter bot commented Oct 2, 2024 •

edited

Loading

cit-pr-commenter bot commented Oct 7, 2024 •

edited

Loading

cit-pr-commenter bot commented Oct 7, 2024 •

edited

Loading

Fine details of change detection per experiment

Bounds Checks Passed

Explanation

vitkyrka left a comment

vitkyrka Oct 9, 2024

vitkyrka Oct 9, 2024

Yumasi Oct 10, 2024

discovery: report containerID for containerised services #29719

Are you sure you want to change the base?

discovery: report containerID for containerised services #29719

Conversation

Yumasi commented Oct 2, 2024 • edited by jira bot Loading

What does this PR do?

Motivation

Describe how to test/QA your changes

Possible Drawbacks / Trade-offs

Additional Notes

pr-commenter bot commented Oct 2, 2024 • edited Loading

Test changes on VM

cit-pr-commenter bot commented Oct 7, 2024 • edited Loading

Go Package Import Differences

cit-pr-commenter bot commented Oct 7, 2024 • edited Loading

Regression Detector

Regression Detector Results

Significant changes in experiment optimization goals

Fine details of change detection per experiment

Bounds Checks Passed

Explanation

vitkyrka left a comment

Choose a reason for hiding this comment

vitkyrka Oct 9, 2024

Choose a reason for hiding this comment

vitkyrka Oct 9, 2024

Choose a reason for hiding this comment

Yumasi Oct 10, 2024

Choose a reason for hiding this comment

Yumasi commented Oct 2, 2024 •

edited by jira bot

Loading

pr-commenter bot commented Oct 2, 2024 •

edited

Loading

cit-pr-commenter bot commented Oct 7, 2024 •

edited

Loading

cit-pr-commenter bot commented Oct 7, 2024 •

edited

Loading