Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new ebpf integration #1478

Merged
merged 30 commits into from
Apr 27, 2022
Merged

Conversation

tpaschalis
Copy link
Member

@tpaschalis tpaschalis commented Mar 10, 2022

PR Description

This PR adds a v2 integration based on cloudflare's ebpf_exporter.

One limitation of the specific exporter is that it doesn't support uprobes/uretprobes. While I was able to fork the exporter and patch in support for those, I thought we'd keep this simple for the time being and create a first version which worked as-is.

The user configures a number of ebpf "programs", which are then attached to the correct place in the kernel. The exporter reads and decodes values from an eBPF map, exposing them as Prometheus metrics.

integrations:
  ebpf:
    programs:
    # Count timers fired in the kernel
    - name: timers
      metrics:
        counters:
          - name: timer_start_total
            help: Timers fired in the kernel
            table: counts
            labels:
              - name: function
                size: 8
                decoders:
                  - name: ksym
      tracepoints:
        timer:timer_start: tracepoint__timer__timer_start
      code: |
        BPF_HASH(counts, u64);

        // Generates function tracepoint__timer__timer_start
        TRACEPOINT_PROBE(timer, timer_start) {
            counts.increment((u64) args->function);
            return 0;
        }

Which issue(s) this PR fixes

No issue, this came out of the Hackathon :)

Notes to the Reviewer

If you agree with the general premise and idea, I can start working on adding documentation, tests, etc.

Do you think there should be behind an additional ebpf_enabled build tag? The current PR fails failed because it couldn't build the gobpf dependency on the Linux image that we run golangci-lint on. I don't think we should add ebpf/bcc bindings into our linting step, just to stop it from complaining, so an extra build tag might be a solution here.

I thought we'd start with a v2 integration outright, as a way to encourage more people to start using them. How do you feel about this?

PR Checklist

  • CHANGELOG updated
  • Documentation added
  • Tests updated

@tpaschalis tpaschalis changed the title Ebpf integration 2 Add new ebpf integration Mar 10, 2022
@tpaschalis tpaschalis marked this pull request as ready for review March 15, 2022 12:14
Comment on lines 1 to 2
//go:build linux && ebpf_enabled
// +build linux,ebpf_enabled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO for now I'd ready have stuff like this be opt out (i.e., the default behavior is to build everything and you use tags to disable specific things).

Can the tag be !noebpf? We use nodocker and nonetwork for disabling specific types of tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue which prompted the opt-in tag is that the Linux agent builds would fail unless some specific system packages are installed (due to the Go->eBPF bindings dependency, which in turn requires specific kernel versions).

$ go build .          # on linux system without support for bcc
# github.com/iovisor/gobpf/bcc
../../vendor/github.com/iovisor/gobpf/bcc/module.go:32:10: fatal error: 'bcc/bcc_common.h' file not found
#include <bcc/bcc_common.h>
         ^~~~~~~~~~~~~~~~~~
1 error generated.

I'm not sure what's an elegant way to handle this. Bundling the required C headers, and somewhat complicating our build pipeline?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can update tools/seego/Dockerfile (it has apt, it's Debian-based) to install dependencies. We do this already in the main Dockerfile to install systemd headers, and extending the build image to include our specific changes would be the way to do this IMO.

I think we can assume the kernel version is new enough, the platforms we'll be mainly building on should have kernels newer than ones from 2017.

Copy link
Member Author

@tpaschalis tpaschalis Mar 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. We'll also need to add the dependencies when linting.

My main concern here is that people who build from source on Linux, might be confused when go build fails with that cryptic message. On one hand, they would already need the systemd headers as you mentioned, but I'm not sure that bcc is also something they'd expect to need if they're not using eBPF.

I'll try to find a way to make it work and report back :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair :) We can document it in our contribution guide (we should probably be mentioning systemd there too but we're not at the moment). Do you think that would be helpful enough?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're keeping track of build dependencies in the contribution guide we should probably make sure we link to that from our README, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would make sense ^^

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a small note about these two dependencies on our contribution guide 323fb00.

Do we still want to have a noebpf flag to make it easier for people to contribute?

Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the integration seems fine % my nits and comment about finding a way to build this by default in our CI setup.

pkg/integrations/v2/ebpf/integration.go Outdated Show resolved Hide resolved
pkg/integrations/v2/ebpf/integration.go Outdated Show resolved Hide resolved
Comment on lines +70 to +78
exp, err := exporter.New(ebpf_config.Config{Programs: e.cfg.Programs})
if err != nil {
return nil, fmt.Errorf("failed to create ebpf exporter with input config: %s", err)
}

err = exp.Attach()
if err != nil {
return nil, fmt.Errorf("failed to attach ebpf exporter: %s", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it cause any problems if two of these are created? We should make sure this still works with /-/reload support.

(We guarantee that Run will only be called once per config, but NewIntegration may be called while the previous instance is still running)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think that there will be any issues.

I tested on a Linux VPS, changing various bits of configuration, enabling/disabling the integration and reloading each time; it seemed to work nicely.

@github-actions
Copy link
Contributor

This PR has been automatically marked as stale because it has not had any activity in the past 30 days.
The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed in 7 days if there is no new activity.
Thank you for your contributions!

@github-actions github-actions bot added the stale Issue/PR mark as stale due lack of activity label Apr 15, 2022
@tpaschalis
Copy link
Member Author

Commenting here to un-stale. I want to check if any new ebpf packaging has been contributed upstream, especially after we changed to a new Debian base image.

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
tpaschalis and others added 19 commits April 26, 2022 15:39
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
…helper

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
Copy link
Collaborator

@mattdurham mattdurham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@mattdurham mattdurham merged commit 6b5313a into grafana:main Apr 27, 2022
@tpaschalis tpaschalis deleted the ebpf-integration-2 branch April 28, 2022 09:45
tpaschalis added a commit to tpaschalis/agent that referenced this pull request Apr 28, 2022
tpaschalis added a commit to tpaschalis/agent that referenced this pull request Apr 28, 2022
This reverts commit 6b5313a.

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
tpaschalis added a commit to tpaschalis/agent that referenced this pull request Apr 29, 2022
@github-actions github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Mar 27, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants