Pulumi operator process gets killed after running pip install #440

JonCholas · 2023-04-28T04:44:50Z

What happened?

Um pushing a stack to pulumi operator, bit it always fails after doing a pip install with a signal: killed although the dependencies get installed as when it runs again and again, pip shows that the dependencies are cached

Pip install command

{
  "insertId": "8li49cy20nfwvele",
  "jsonPayload": {
    "Args": [
      "../../venv/bin/python",
      "-m",
      "pip",
      "install",
      "-r",
      "requirements.txt"
    ],
    "Request.Name": "deployment-beta-beta",
    "Dir": "/tmp/pulumi-working/beta-neara-f515a5a8/deployment-beta-beta/pulumi/projects/gcp-neara-app",
    "ts": "2023-04-28T04:27:27.411Z",
    "Request.Namespace": "beta-neara-f515a5a8",
    "Path": "../../venv/bin/python",
    "msg": "Pip Install",
    "Stdout": "Installing collected packages: types-PyYAML, arpeggio, urllib3, typing-extensions, tomli, six, semver, pyyaml, pyflakes, pycodestyle, protobuf, platformdirs, pathspec, packaging, mypy-extensions, mccabe, isort, idna, grpcio, dill, click, charset-normalizer, certifi, attrs, requests, pulumi, parver, mypy, flake8, black, pulumi-random, pulumi-kubernetes, pulumi-gcp, pulumi-aws, pulumi-eks, pulumi-neara",
    "logger": "controller_stack",
    "level": "debug"
  },
  "resource": {
    "type": "k8s_container",
    "labels": {
      "location": "us-west1",
      "namespace_name": "beta-neara-f515a5a8",
      "project_id": "customer-sce-beta-to40kc",
      "container_name": "pulumi-kubernetes-operator",
      "cluster_name": "neara-3d6f0d0",
      "pod_name": "beta-pulumi-operator-0606f596-6599f8bb94-6lrqp"
    }
  },
  "timestamp": "2023-04-28T04:27:27.412081049Z",
  "severity": "DEBUG",
  "labels": {
    "k8s-pod/service_istio_io/canonical-revision": "latest",
    "compute.googleapis.com/resource_name": "gke-neara-3d6f0d0-neara-spot-08f9e96-d753d3c7-fhnj",
    "k8s-pod/name": "pulumi-kubernetes-operator",
    "k8s-pod/service_istio_io/canonical-name": "beta-pulumi-operator-0606f596",
    "k8s-pod/security_istio_io/tlsMode": "istio",
    "k8s-pod/pod-template-hash": "6599f8bb94"
  },
  "logName": "projects/customer-sce-beta-to40kc/logs/stderr",
  "receiveTimestamp": "2023-04-28T04:27:29.982596615Z"
}

Next log message

{
  "insertId": "nakws6p7r41n1rjz",
  "jsonPayload": {
    "Request.Namespace": "beta-neara-f515a5a8",
    "msg": "Failed to setup Pulumi workdir",
    "Request.Name": "deployment-beta-beta",
    "ts": "2023-04-28T04:27:46.377Z",
    "level": "error",
    "Stack.Name": "gcp-neara-app__us-west1__sce-beta-beta",
    "error": "installing project dependencies: signal: killed",
    "logger": "controller_stack",
    "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0/pkg/internal/controller/controller.go:214"
  },
  "resource": {
    "type": "k8s_container",
    "labels": {
      "namespace_name": "beta-neara-f515a5a8",
      "container_name": "pulumi-kubernetes-operator",
      "pod_name": "beta-pulumi-operator-0606f596-6599f8bb94-6lrqp",
      "cluster_name": "neara-3d6f0d0",
      "project_id": "customer-sce-beta-to40kc",
      "location": "us-west1"
    }
  },
  "timestamp": "2023-04-28T04:27:46.377844302Z",
  "severity": "ERROR",
  "labels": {
    "k8s-pod/pod-template-hash": "6599f8bb94",
    "k8s-pod/service_istio_io/canonical-name": "beta-pulumi-operator-0606f596",
    "compute.googleapis.com/resource_name": "gke-neara-3d6f0d0-neara-spot-08f9e96-d753d3c7-fhnj",
    "k8s-pod/service_istio_io/canonical-revision": "latest",
    "k8s-pod/name": "pulumi-kubernetes-operator",
    "k8s-pod/security_istio_io/tlsMode": "istio"
  },
  "logName": "projects/customer-sce-beta-to40kc/logs/stderr",
  "receiveTimestamp": "2023-04-28T04:27:50.020413977Z"
}

Expected Behavior

The workdir is setup correctly and pulumi up runs without problems

Steps to reproduce

Push a stack in python that has installation with pip

Output of `pulumi about`

pulumi about
CLI
Version 3.64.0
Go Version go1.20.3
Go Compiler gc

Host
OS debian
Version 11.6
Arch x86_64

Pulumi locates its logs in /tmp by default
warning: Failed to read project: no Pulumi.yaml project file found (searching upwards from /). If you have not created a project yet, use pulumi new to do so: no project file found
warning: A new version of Pulumi is available. To upgrade from version '3.64.0' to '3.65.1', visit https://pulumi.com/docs/reference/install/ for manual instructions and release notes.

Additional context

I have seen this before but it was retrying and then succeeding. Not sure what has changed that now it always fails.
Im running this in a GKE cluster

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

The text was updated successfully, but these errors were encountered:

rquitales · 2023-04-28T19:56:14Z

@JonCholas Thanks for reporting this issue. Would you also be able to provide the configuration and specs of your GKE cluster and nodes? Could you also provide a sample of your Python stack, or describe which resources you are using.

It appears that the pip install process is likely being killed as there isn't enough memory and swap for it to install all the required dependencies. I've tried a simple minimal Python stack that creates a GCP Storage Bucket, and am not seeing any signal killed issues.

JonCholas · 2023-05-02T01:54:58Z

I did try giving the pod more resources and it worked. The odd thing is that the graphs on consumed resources weren't showing any spike, so I'm guessing the memory consumption rump up is too fast to show anything.

Is there a way to improve the operator to show more details on why the process was killed?

mikhailshilkov · 2023-05-08T21:44:47Z

@rquitales Could you please adjust the labels and respond to Jon again? Thank you.

rquitales · 2023-05-08T22:30:02Z

@JonCholas Thanks for the update, and apologies for the delayed response on this. There currently isn't a way to show more details on why this process is killed.

The kill signal is coming from pip itself (see https://stackoverflow.com/questions/43245196/pip-install-killed). We shell out to pip to install the required Python dependencies, and return any errors that may occur. Unfortunately, the error message returned from pip isn't descriptive, so we can only deduce that the issue is due to resource starvation based on context.

We can try to improve logging within our operator with more info messages, but I'm not sure if we can provide more descriptive errors for pip being killed.

rquitales · 2023-05-08T22:32:45Z

As next steps, I'll look into possible ways to improve the current logging we have to better surface this issue instead of having a simple "signal killed" message.

JonCholas · 2023-05-09T10:34:50Z

@rquitales sounds good, I think even a FAQ to troubleshooting doc about comment problems with the operator will fix it if there is no more details from pip

jeduden · 2024-06-10T11:35:55Z

how about logging stdout + stderr of the pip install command to the log?
in general, the log output and status update of the operator should contain enough information to point the user of the controller to the cause of the error.

i have currently the same issue. I have been able to exec into the operator container. Running pip install from the commandline works and exists with error code 0.

In my case, the operator's container hasn't been restarted. There are also no memory limits set on the container.

jeduden · 2024-06-11T06:41:26Z

Update: I was able to troubleshoot the issue by redeploying the operator with --zap-level=debug. The stdout/stdout of the pip install execution are then printed in the logs of the operator.
It turns out the requirements.txt wasn't found.
However, I checked the local workspace on the operator's volume. The requirements.txt file is present.

JonCholas added kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels Apr 28, 2023

rquitales added awaiting-feedback Blocked on input from the author and removed needs-triage Needs attention from the triage team labels Apr 28, 2023

rquitales added impact/usability Something that impacts users' ability to use the product easily and intuitively and removed awaiting-feedback Blocked on input from the author labels May 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pulumi operator process gets killed after running pip install #440

Pulumi operator process gets killed after running pip install #440

JonCholas commented Apr 28, 2023

rquitales commented Apr 28, 2023

JonCholas commented May 2, 2023

mikhailshilkov commented May 8, 2023

rquitales commented May 8, 2023

rquitales commented May 8, 2023

JonCholas commented May 9, 2023

jeduden commented Jun 10, 2024 •

edited

Loading

jeduden commented Jun 11, 2024 •

edited

Loading

Pulumi operator process gets killed after running pip install #440

Pulumi operator process gets killed after running pip install #440

Comments

JonCholas commented Apr 28, 2023

What happened?

Expected Behavior

Steps to reproduce

Output of pulumi about

Additional context

Contributing

rquitales commented Apr 28, 2023

JonCholas commented May 2, 2023

mikhailshilkov commented May 8, 2023

rquitales commented May 8, 2023

rquitales commented May 8, 2023

JonCholas commented May 9, 2023

jeduden commented Jun 10, 2024 • edited Loading

jeduden commented Jun 11, 2024 • edited Loading

Output of `pulumi about`

jeduden commented Jun 10, 2024 •

edited

Loading

jeduden commented Jun 11, 2024 •

edited

Loading