Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADXT-199] Early start long e2e tests environment deployment on the CI (priority on EKS) #26891

Open
wants to merge 60 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
20586da
WIP
KevinFairise2 Jun 18, 2024
67c952a
First test in CI to use pre initialized envs
KevinFairise2 Jun 19, 2024
5f643fc
First test in CI to use pre initialized envs
KevinFairise2 Jun 19, 2024
8a66aba
Add init only mode for containers and npm eks tests
KevinFairise2 Jun 19, 2024
bfe3f20
Fix initOnly param
KevinFairise2 Jun 19, 2024
b0e0f34
Fix initOnly param
KevinFairise2 Jun 19, 2024
e10f67c
Compare with and without init
KevinFairise2 Jun 20, 2024
15e289c
Try to work with containers test [skip cancel]
KevinFairise2 Jun 20, 2024
acc4b06
Merge branch 'main' of github.com:DataDog/datadog-agent into kfairise…
KevinFairise2 Jul 5, 2024
496a632
Clean duplicated jobs [skip cancel]
KevinFairise2 Jul 5, 2024
c2d84df
Merge branch 'main' of github.com:DataDog/datadog-agent into kfairise…
KevinFairise2 Aug 2, 2024
9212a87
Test infra init only
KevinFairise2 Aug 2, 2024
8b6c0c9
Remove leftover
KevinFairise2 Aug 2, 2024
2510a13
Get go mod deps
KevinFairise2 Aug 2, 2024
abf6e3b
Fix deps for npm
KevinFairise2 Aug 12, 2024
e5f7e48
Merge branch 'main' of github.com:DataDog/datadog-agent into kfairise…
KevinFairise2 Sep 12, 2024
81dd4bd
Add sync job to allow manually running test job
KevinFairise2 Sep 12, 2024
e07192c
Add cleanup job
KevinFairise2 Sep 13, 2024
44275bd
Add tag to sync
KevinFairise2 Sep 13, 2024
c75df86
Add failure [skip cancel]
KevinFairise2 Sep 13, 2024
7795d0d
Fix deps
KevinFairise2 Sep 13, 2024
50e956e
Remove failure [skip cancel]
KevinFairise2 Sep 13, 2024
cbed7fb
Merge branch 'main' of github.com:DataDog/datadog-agent into kfairise…
KevinFairise2 Sep 13, 2024
72939be
Add failure [skip cancel]
KevinFairise2 Sep 13, 2024
cb68550
Test with correct deps
KevinFairise2 Sep 13, 2024
b7498b8
split stages and always run cleanup
KevinFairise2 Sep 16, 2024
4e37884
Fix stage and needs
KevinFairise2 Sep 16, 2024
6bc43c4
Empty
KevinFairise2 Sep 16, 2024
0c48ad8
Remove dependency
KevinFairise2 Sep 16, 2024
0e422fb
Fix cleanup
KevinFairise2 Sep 16, 2024
8442f12
Fix cleanup job
KevinFairise2 Sep 16, 2024
9e22c29
Fix cleanup job for real this time I hope
KevinFairise2 Sep 16, 2024
9ff5d07
Fix cleanup job for real this time I hope by fixing PULUMI passphrase
KevinFairise2 Sep 17, 2024
9963316
Set passphrase correctly
KevinFairise2 Sep 23, 2024
f44a986
Remove failure
KevinFairise2 Sep 24, 2024
2021813
Do not crash if job does not contain rules
KevinFairise2 Sep 24, 2024
3c5a472
Update path
KevinFairise2 Sep 24, 2024
cc0f9ce
Uncomment
KevinFairise2 Sep 24, 2024
7eeefb2
Remove sync job and use allow_failure_true instead
KevinFairise2 Sep 25, 2024
66e0610
Simplify initOnly
KevinFairise2 Sep 25, 2024
ecc96a3
Print error
KevinFairise2 Sep 25, 2024
661f531
Update .gitlab/e2e/e2e.yml
KevinFairise2 Sep 25, 2024
4706f26
Update .gitlab/e2e/e2e.yml
KevinFairise2 Sep 25, 2024
d77ed79
Update .gitlab/e2e/e2e.yml
KevinFairise2 Sep 25, 2024
412bda8
Update tasks/new_e2e_tests.py
KevinFairise2 Sep 25, 2024
565746b
Fix suggestion
KevinFairise2 Sep 25, 2024
aa43015
Apply suggestion
KevinFairise2 Sep 25, 2024
74be619
Update .gitlab/e2e/e2e.yml
KevinFairise2 Sep 25, 2024
69bb994
Remove print and use -e
KevinFairise2 Sep 25, 2024
6f12bd1
Merge branch 'kfairise/test-early-create-eks' of github.com:DataDog/d…
KevinFairise2 Sep 25, 2024
efb6955
Address comments
KevinFairise2 Sep 30, 2024
658fac5
Fix test and remove no longer existing params
KevinFairise2 Sep 30, 2024
cb50ebd
Remove unneeded line
KevinFairise2 Oct 1, 2024
0e9807b
Make cleanup_remote_stack more generic
KevinFairise2 Oct 3, 2024
7114f3b
Use pulumi stack ls --all
KevinFairise2 Oct 7, 2024
d99c2ef
Merge branch 'main' into kfairise/test-early-create-eks
KevinFairise2 Oct 7, 2024
75bdc8d
Merge branch 'main' of github.com:DataDog/datadog-agent into kfairise…
KevinFairise2 Oct 10, 2024
fee3d34
Merge branch 'kfairise/test-early-create-eks' of github.com:DataDog/d…
KevinFairise2 Oct 10, 2024
9243dd4
Authorize my cleanup job
KevinFairise2 Oct 10, 2024
4f173fc
Fix linter
KevinFairise2 Oct 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 99 additions & 2 deletions .gitlab/e2e/e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -103,11 +103,53 @@ new-e2e-containers:
- EXTRA_PARAMS: "--run TestKindSuite -c ddinfra:kubernetesVersion=1.29"
- EXTRA_PARAMS: "--run TestKindSuite -c ddinfra:osDescriptor=ubuntu:20.04"
- EXTRA_PARAMS: "--run TestKindSuite -c ddinfra:osDescriptor=ubuntu:22.04"
- EXTRA_PARAMS: --run TestEKSSuite
- EXTRA_PARAMS: --run TestECSSuite
- EXTRA_PARAMS: --run TestDockerSuite
- EXTRA_PARAMS: --skip "Test(Kind|EKS|ECS|Docker)Suite"

new-e2e-containers-eks-init:
extends: .new_e2e_template
needs:
- !reference [.needs_new_e2e_template]
rules:
- !reference [.on_container_or_e2e_changes]
- !reference [.manual]
variables:
TARGETS: ./tests/containers
TEAM: container-integrations
EXTRA_PARAMS: --run TestEKSSuite
E2E_INIT_ONLY: "true"

# This job does not do anything. This is a workaround to be able to run new-e2e-containers-eks even if the init job fails. Any better solution would be appreciated
KevinFairise2 marked this conversation as resolved.
Show resolved Hide resolved
new-e2e-containers-eks-sync:
image: 486234852809.dkr.ecr.us-east-1.amazonaws.com/ci/test-infra-definitions/runner$TEST_INFRA_DEFINITIONS_BUILDIMAGES_SUFFIX:$TEST_INFRA_DEFINITIONS_BUILDIMAGES
stage: e2e
tags: ["arch:amd64"]
script:
- exit 0
needs:
- new-e2e-containers-eks-init
rules:
- !reference [.on_container_or_e2e_changes]
- !reference [.manual]
when: always

new-e2e-containers-eks:
extends: .new_e2e_template
rules:
- !reference [.on_container_or_e2e_changes]
- !reference [.manual]
needs:
- !reference [.needs_new_e2e_template]
- new-e2e-containers-eks-sync
- qa_agent
- qa_dca
variables:
TARGETS: ./tests/containers
TEAM: container-integrations
EXTRA_PARAMS: --run TestEKSSuite
E2E_PRE_INITIALIZED: "true"

new-e2e-remote-config:
extends: .new_e2e_template_needs_deb_x64
rules:
Expand Down Expand Up @@ -210,7 +252,50 @@ new-e2e-npm-docker:
variables:
TARGETS: ./tests/npm
TEAM: network-performance-monitoring
EXTRA_PARAMS: --run "Test(ECSVM|EC2VMContainerized|EKSVM)Suite"
EXTRA_PARAMS: --run "Test(ECSVM|EC2VMContainerized)Suite"


new-e2e-npm-eks-init:
extends: .new_e2e_template
needs:
- !reference [.needs_new_e2e_template]
rules:
- !reference [.on_npm_or_e2e_changes]
- !reference [.manual]
variables:
TARGETS: ./tests/npm
TEAM: network-performance-monitoring
EXTRA_PARAMS: --run "TestEKSVMSuite"
E2E_INIT_ONLY: "true"

new-e2e-npm-eks-sync:
image: 486234852809.dkr.ecr.us-east-1.amazonaws.com/ci/test-infra-definitions/runner$TEST_INFRA_DEFINITIONS_BUILDIMAGES_SUFFIX:$TEST_INFRA_DEFINITIONS_BUILDIMAGES
tags: ["arch:amd64"]
stage: e2e
script:
- exit 0
needs:
- new-e2e-npm-eks-init
rules:
- !reference [.on_npm_or_e2e_changes]
- !reference [.manual]
when: always

new-e2e-npm-eks:
extends: .new_e2e_template
rules:
- !reference [.on_npm_or_e2e_changes]
- !reference [.manual]
needs:
- !reference [.needs_new_e2e_template]
- new-e2e-npm-eks-sync
- qa_agent
- qa_dca
variables:
TARGETS: ./tests/npm
TEAM: network-performance-monitoring
EXTRA_PARAMS: --run "TestEKSVMSuite"
E2E_PRE_INITIALIZED: "true"

new-e2e-aml:
extends: .new_e2e_template
Expand Down Expand Up @@ -501,3 +586,15 @@ trigger-flakes-finder:
- artifact: flake-finder-gitlab-ci.yml
job: generate-flakes-finder-pipeline
allow_failure: true

new-e2e-eks-cleanup-on-failure:
extends:
- .new_e2e_template_needs_container_deploy
needs:
- new-e2e-containers-eks-sync
- new-e2e-npm-eks-sync
script:
- inv new-e2e-tests.cleanup-remote-stacks --pipeline-id=$E2E_PIPELINE_ID --pulumi-backend=dd-pulumi-state
when: on_failure
allow_failure: true

2 changes: 2 additions & 0 deletions .gitlab/package_build/linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,8 @@ datadog-agent-6-x64:
# build Agent 7 binaries for x86_64
datadog-agent-7-x64:
extends: [.agent_build_common, .agent_build_x86, .agent_7_build]
script:
- exit 1

# build Agent 6 binaries for arm64
datadog-agent-6-arm64:
Expand Down
45 changes: 45 additions & 0 deletions tasks/new_e2e_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from __future__ import annotations

import json
import multiprocessing
import os
import os.path
import shutil
Expand All @@ -22,6 +23,7 @@
from tasks.libs.common.go import download_go_dependencies
from tasks.libs.common.utils import REPO_PATH, color_message, running_in_ci
from tasks.modules import DEFAULT_MODULES
from tools.e2e_stacks import destroy_remote_stack


@task(
Expand Down Expand Up @@ -194,6 +196,49 @@ def clean(ctx, locks=True, stacks=False, output=False, skip_destroy=False):
_clean_output()


@task
def cleanup_remote_stacks(ctx, pipeline_id, pulumi_backend):
KevinFairise2 marked this conversation as resolved.
Show resolved Hide resolved
"""
Clean up remote stacks created by the pipeline
"""
# if not running_in_ci():
# raise Exit("This task should be run in CI only", 1)
KevinFairise2 marked this conversation as resolved.
Show resolved Hide resolved
print(f"Running aws s3api list-objects --bucket {pulumi_backend} --prefix .pulumi/stacks/e2eci/ci-{pipeline_id}")
KevinFairise2 marked this conversation as resolved.
Show resolved Hide resolved
res = ctx.run(
f"aws s3api list-objects --bucket {pulumi_backend} --prefix .pulumi/stacks/e2eci/ci-{pipeline_id}",
hide=True,
warn=True,
)
if res.exited != 0:
print(f"Failed to list stacks in bucket {pulumi_backend}:", res.stdout, res.stderr)
return
eks_stacks = set()
stacks = json.loads(res.stdout)
if "Contents" in stacks:
for stack in stacks["Contents"]:
if "Key" in stack:
stack_id = stack["Key"].split("/")[-1].replace(".json.bak", "").replace(".json", "")
KevinFairise2 marked this conversation as resolved.
Show resolved Hide resolved
if "eks" in stack_id:
eks_stacks.add(f"organization/e2eci/{stack_id}")

pool = multiprocessing.Pool(len(eks_stacks))
res = pool.map(destroy_remote_stack, eks_stacks)
destroyed_stack = set()
failed_stack = set()
for r, stack in res:
if r.returncode != 0:
failed_stack.add(stack)
else:
destroyed_stack.add(stack)
print(f"Stack {stack}: {r.stdout} {r.stderr}")
pool.close()
KevinFairise2 marked this conversation as resolved.
Show resolved Hide resolved

for stack in destroyed_stack:
print(f"Stack {stack} destroyed successfully")
for stack in failed_stack:
print(f"Failed to destroy stack {stack}")
KevinFairise2 marked this conversation as resolved.
Show resolved Hide resolved


@task
def deps(ctx, verbose=False):
"""
Expand Down
6 changes: 6 additions & 0 deletions tasks/tools/e2e_stacks.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
import subprocess


# This function cannot be defined in a file that imports invoke.tasks. Otherwise it fails when called with multiprocessing.
def destroy_remote_stack(stack: str):
return subprocess.run(["pulumi", "destroy", "--yes", "--stack", stack], capture_output=True, text=True), stack
18 changes: 18 additions & 0 deletions test/new-e2e/pkg/e2e/suite.go
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,7 @@ type BaseSuite[Env any] struct {
currentProvisioners ProvisionerMap

firstFailTest string
initOnly bool
}

//
Expand Down Expand Up @@ -306,6 +307,10 @@ func (bs *BaseSuite[Env]) reconcileEnv(targetProvisioners ProvisionerMap) error
resources.Merge(provisionerResources)
}

if bs.initOnly {
KevinFairise2 marked this conversation as resolved.
Show resolved Hide resolved
return nil
}

// Env is taken as parameter as some fields may have keys set by Env pulumi program.
err = bs.buildEnvFromResources(resources, newEnvFields, newEnvValues)
if err != nil {
Expand All @@ -328,6 +333,11 @@ func (bs *BaseSuite[Env]) reconcileEnv(targetProvisioners ProvisionerMap) error

func (bs *BaseSuite[Env]) createEnv() (*Env, []reflect.StructField, []reflect.Value, error) {
var env Env
initOnly, err := runner.GetProfile().ParamStore().GetBoolWithDefault(parameters.InitOnly, false)
KevinFairise2 marked this conversation as resolved.
Show resolved Hide resolved
if err == nil {
bs.initOnly = initOnly
}

envFields := reflect.VisibleFields(reflect.TypeOf(&env).Elem())
envValue := reflect.ValueOf(&env)

Expand Down Expand Up @@ -467,6 +477,9 @@ func (bs *BaseSuite[Env]) SetupSuite() {
// `panic()` is required to stop the execution of the test suite. Otherwise `testify.Suite` will keep on running suite tests.
panic(err)
}
if bs.initOnly {
KevinFairise2 marked this conversation as resolved.
Show resolved Hide resolved
bs.T().Skip("INIT_ONLY is set, skipping tests")
KevinFairise2 marked this conversation as resolved.
Show resolved Hide resolved
}
}

// BeforeTest is executed right before the test starts and receives the suite and test names as input.
Expand Down Expand Up @@ -513,6 +526,11 @@ func (bs *BaseSuite[Env]) TearDownSuite() {
return
}

if bs.initOnly {
bs.T().Logf("INIT_ONLY is set, skipping deletion")
return
}

if bs.firstFailTest != "" && bs.params.skipDeleteOnFailure {
bs.Require().FailNow(fmt.Sprintf("%v failed. As SkipDeleteOnFailure feature is enabled the tests after %v were skipped. "+
"The environment of %v was kept.", bs.firstFailTest, bs.firstFailTest, bs.firstFailTest))
Expand Down
5 changes: 5 additions & 0 deletions test/new-e2e/pkg/environments/aws/kubernetes/eks.go
Original file line number Diff line number Diff line change
Expand Up @@ -175,9 +175,14 @@ func EKSRunFunc(ctx *pulumi.Context, env *environments.Kubernetes, params *Provi
return err
}

if params.eksInitOnly {
KevinFairise2 marked this conversation as resolved.
Show resolved Hide resolved
return nil
}

kubeConfig, err := cluster.GetKubeconfig(ctx, &eks.ClusterGetKubeconfigArgs{
ProfileName: pulumi.String(awsEnv.Profile()),
})

if err != nil {
return err
}
Expand Down
9 changes: 9 additions & 0 deletions test/new-e2e/pkg/environments/aws/kubernetes/params.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ type ProvisionerParams struct {
eksLinuxARMNodeGroup bool
eksBottlerocketNodeGroup bool
eksWindowsNodeGroup bool
eksInitOnly bool
awsEnv *aws.Environment
deployDogstatsd bool
}
Expand Down Expand Up @@ -133,6 +134,14 @@ func WithEKSWindowsNodeGroup() ProvisionerOption {
}
}

// WithEKSInitOnly enable EKS init only
func WithEKSInitOnly() ProvisionerOption {
return func(params *ProvisionerParams) error {
params.eksInitOnly = true
return nil
}
}

// WithDeployDogstatsd deploy standalone dogstatd
func WithDeployDogstatsd() ProvisionerOption {
return func(params *ProvisionerParams) error {
Expand Down
18 changes: 16 additions & 2 deletions test/new-e2e/pkg/runner/ci_profile.go
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,23 @@ func NewCIProfile() (Profile, error) {
if jobID == "" || projectID == "" {
return nil, fmt.Errorf("unable to compute name prefix, missing variables job id: %s, project id: %s", jobID, projectID)
}

uniqueID := jobID
store := parameters.NewEnvStore(EnvPrefix)

initOnly, err := store.GetBoolWithDefault(parameters.InitOnly, false)
if err != nil {
return nil, err
}

preinitialized, err := store.GetBoolWithDefault(parameters.PreInitialized, false)
KevinFairise2 marked this conversation as resolved.
Show resolved Hide resolved
if err != nil {
return nil, err
}

if initOnly || preinitialized {
uniqueID = os.Getenv("CI_PIPELINE_ID") // We use pipeline ID for init only and pre-initialized jobs, to be able to share state
}

// get environments from store
environmentsStr, err := store.GetWithDefault(parameters.Environments, "")
if err != nil {
Expand All @@ -74,7 +88,7 @@ func NewCIProfile() (Profile, error) {

return ciProfile{
baseProfile: newProfile("e2eci", ciEnvironments, store, &secretStore, outputRoot),
ciUniqueID: "ci-" + jobID + "-" + projectID,
ciUniqueID: "ci-" + uniqueID + "-" + projectID,
KevinFairise2 marked this conversation as resolved.
Show resolved Hide resolved
}, nil
}

Expand Down
4 changes: 4 additions & 0 deletions test/new-e2e/pkg/runner/parameters/const.go
Original file line number Diff line number Diff line change
Expand Up @@ -49,4 +49,8 @@ const (
PulumiVerboseProgressStreams StoreKey = "pulumi_verbose_progress_streams"
// DevMode allows to keep the stack after the test completes
DevMode StoreKey = "dev_mode"
// InitOnly config flag parameter name
InitOnly StoreKey = "init_only"
// PreInitialized config flag parameter name
PreInitialized StoreKey = "pre_initialized"
)
22 changes: 21 additions & 1 deletion test/new-e2e/tests/containers/eks_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ import (

"github.com/DataDog/datadog-agent/test/new-e2e/pkg/components"
"github.com/DataDog/datadog-agent/test/new-e2e/pkg/runner"
"github.com/DataDog/datadog-agent/test/new-e2e/pkg/runner/parameters"
"github.com/DataDog/datadog-agent/test/new-e2e/pkg/utils/infra"

"github.com/pulumi/pulumi/sdk/v3/go/auto"
Expand All @@ -23,10 +24,17 @@ import (

type eksSuite struct {
k8sSuite

initOnly bool
}

func TestEKSSuite(t *testing.T) {
suite.Run(t, &eksSuite{})
var initOnly bool
initOnlyParam, err := runner.GetProfile().ParamStore().GetBoolWithDefault(parameters.InitOnly, false)
if err == nil {
initOnly = initOnlyParam
}
suite.Run(t, &eksSuite{initOnly: initOnly})
}

func (suite *eksSuite) SetupSuite() {
Expand All @@ -38,6 +46,9 @@ func (suite *eksSuite) SetupSuite() {
"ddtestworkload:deploy": auto.ConfigValue{Value: "true"},
"dddogstatsd:deploy": auto.ConfigValue{Value: "true"},
}
if suite.initOnly {
stackConfig["ddinfra:initOnly"] = auto.ConfigValue{Value: "true"}
}

_, stackOutput, err := infra.GetStackManager().GetStackNoDeleteOnFailure(
ctx,
Expand All @@ -56,6 +67,10 @@ func (suite *eksSuite) SetupSuite() {
suite.T().FailNow()
}

if suite.initOnly {
suite.T().Skip("E2E_INIT_ONLY is set, skipping tests")
}

fakeintake := &components.FakeIntake{}
fiSerialized, err := json.Marshal(stackOutput.Outputs["dd-Fakeintake-aws-ecs"].Value)
suite.Require().NoError(err)
Expand Down Expand Up @@ -84,6 +99,11 @@ func (suite *eksSuite) SetupSuite() {
}

func (suite *eksSuite) TearDownSuite() {
if suite.initOnly {
suite.T().Logf("E2E_INIT_ONLY is set, skipping deletion")
return
}

suite.k8sSuite.TearDownSuite()

ctx := context.Background()
Expand Down
Loading
Loading