Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pulumi operator process gets killed after running pip install #440

Open
JonCholas opened this issue Apr 28, 2023 · 8 comments
Open

Pulumi operator process gets killed after running pip install #440

JonCholas opened this issue Apr 28, 2023 · 8 comments
Labels
impact/usability Something that impacts users' ability to use the product easily and intuitively kind/bug Some behavior is incorrect or out of spec

Comments

@JonCholas
Copy link

What happened?

Um pushing a stack to pulumi operator, bit it always fails after doing a pip install with a signal: killed although the dependencies get installed as when it runs again and again, pip shows that the dependencies are cached

Pip install command

{
  "insertId": "8li49cy20nfwvele",
  "jsonPayload": {
    "Args": [
      "../../venv/bin/python",
      "-m",
      "pip",
      "install",
      "-r",
      "requirements.txt"
    ],
    "Request.Name": "deployment-beta-beta",
    "Dir": "/tmp/pulumi-working/beta-neara-f515a5a8/deployment-beta-beta/pulumi/projects/gcp-neara-app",
    "ts": "2023-04-28T04:27:27.411Z",
    "Request.Namespace": "beta-neara-f515a5a8",
    "Path": "../../venv/bin/python",
    "msg": "Pip Install",
    "Stdout": "Installing collected packages: types-PyYAML, arpeggio, urllib3, typing-extensions, tomli, six, semver, pyyaml, pyflakes, pycodestyle, protobuf, platformdirs, pathspec, packaging, mypy-extensions, mccabe, isort, idna, grpcio, dill, click, charset-normalizer, certifi, attrs, requests, pulumi, parver, mypy, flake8, black, pulumi-random, pulumi-kubernetes, pulumi-gcp, pulumi-aws, pulumi-eks, pulumi-neara",
    "logger": "controller_stack",
    "level": "debug"
  },
  "resource": {
    "type": "k8s_container",
    "labels": {
      "location": "us-west1",
      "namespace_name": "beta-neara-f515a5a8",
      "project_id": "customer-sce-beta-to40kc",
      "container_name": "pulumi-kubernetes-operator",
      "cluster_name": "neara-3d6f0d0",
      "pod_name": "beta-pulumi-operator-0606f596-6599f8bb94-6lrqp"
    }
  },
  "timestamp": "2023-04-28T04:27:27.412081049Z",
  "severity": "DEBUG",
  "labels": {
    "k8s-pod/service_istio_io/canonical-revision": "latest",
    "compute.googleapis.com/resource_name": "gke-neara-3d6f0d0-neara-spot-08f9e96-d753d3c7-fhnj",
    "k8s-pod/name": "pulumi-kubernetes-operator",
    "k8s-pod/service_istio_io/canonical-name": "beta-pulumi-operator-0606f596",
    "k8s-pod/security_istio_io/tlsMode": "istio",
    "k8s-pod/pod-template-hash": "6599f8bb94"
  },
  "logName": "projects/customer-sce-beta-to40kc/logs/stderr",
  "receiveTimestamp": "2023-04-28T04:27:29.982596615Z"
}

Next log message

{
  "insertId": "nakws6p7r41n1rjz",
  "jsonPayload": {
    "Request.Namespace": "beta-neara-f515a5a8",
    "msg": "Failed to setup Pulumi workdir",
    "Request.Name": "deployment-beta-beta",
    "ts": "2023-04-28T04:27:46.377Z",
    "level": "error",
    "Stack.Name": "gcp-neara-app__us-west1__sce-beta-beta",
    "error": "installing project dependencies: signal: killed",
    "logger": "controller_stack",
    "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0/pkg/internal/controller/controller.go:214"
  },
  "resource": {
    "type": "k8s_container",
    "labels": {
      "namespace_name": "beta-neara-f515a5a8",
      "container_name": "pulumi-kubernetes-operator",
      "pod_name": "beta-pulumi-operator-0606f596-6599f8bb94-6lrqp",
      "cluster_name": "neara-3d6f0d0",
      "project_id": "customer-sce-beta-to40kc",
      "location": "us-west1"
    }
  },
  "timestamp": "2023-04-28T04:27:46.377844302Z",
  "severity": "ERROR",
  "labels": {
    "k8s-pod/pod-template-hash": "6599f8bb94",
    "k8s-pod/service_istio_io/canonical-name": "beta-pulumi-operator-0606f596",
    "compute.googleapis.com/resource_name": "gke-neara-3d6f0d0-neara-spot-08f9e96-d753d3c7-fhnj",
    "k8s-pod/service_istio_io/canonical-revision": "latest",
    "k8s-pod/name": "pulumi-kubernetes-operator",
    "k8s-pod/security_istio_io/tlsMode": "istio"
  },
  "logName": "projects/customer-sce-beta-to40kc/logs/stderr",
  "receiveTimestamp": "2023-04-28T04:27:50.020413977Z"
}

Expected Behavior

The workdir is setup correctly and pulumi up runs without problems

Steps to reproduce

Push a stack in python that has installation with pip

Output of pulumi about

pulumi about
CLI
Version 3.64.0
Go Version go1.20.3
Go Compiler gc

Host
OS debian
Version 11.6
Arch x86_64

Pulumi locates its logs in /tmp by default
warning: Failed to read project: no Pulumi.yaml project file found (searching upwards from /). If you have not created a project yet, use pulumi new to do so: no project file found
warning: A new version of Pulumi is available. To upgrade from version '3.64.0' to '3.65.1', visit https://pulumi.com/docs/reference/install/ for manual instructions and release notes.

Additional context

I have seen this before but it was retrying and then succeeding. Not sure what has changed that now it always fails.
Im running this in a GKE cluster

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

@JonCholas JonCholas added kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels Apr 28, 2023
@rquitales
Copy link
Contributor

@JonCholas Thanks for reporting this issue. Would you also be able to provide the configuration and specs of your GKE cluster and nodes? Could you also provide a sample of your Python stack, or describe which resources you are using.

It appears that the pip install process is likely being killed as there isn't enough memory and swap for it to install all the required dependencies. I've tried a simple minimal Python stack that creates a GCP Storage Bucket, and am not seeing any signal killed issues.

@rquitales rquitales added awaiting-feedback Blocked on input from the author and removed needs-triage Needs attention from the triage team labels Apr 28, 2023
@JonCholas
Copy link
Author

I did try giving the pod more resources and it worked. The odd thing is that the graphs on consumed resources weren't showing any spike, so I'm guessing the memory consumption rump up is too fast to show anything.

Is there a way to improve the operator to show more details on why the process was killed?

@mikhailshilkov
Copy link
Member

@rquitales Could you please adjust the labels and respond to Jon again? Thank you.

@rquitales
Copy link
Contributor

@JonCholas Thanks for the update, and apologies for the delayed response on this. There currently isn't a way to show more details on why this process is killed.

The kill signal is coming from pip itself (see https://stackoverflow.com/questions/43245196/pip-install-killed). We shell out to pip to install the required Python dependencies, and return any errors that may occur. Unfortunately, the error message returned from pip isn't descriptive, so we can only deduce that the issue is due to resource starvation based on context.

We can try to improve logging within our operator with more info messages, but I'm not sure if we can provide more descriptive errors for pip being killed.

@rquitales rquitales added impact/usability Something that impacts users' ability to use the product easily and intuitively and removed awaiting-feedback Blocked on input from the author labels May 8, 2023
@rquitales
Copy link
Contributor

As next steps, I'll look into possible ways to improve the current logging we have to better surface this issue instead of having a simple "signal killed" message.

@JonCholas
Copy link
Author

@rquitales sounds good, I think even a FAQ to troubleshooting doc about comment problems with the operator will fix it if there is no more details from pip

@jeduden
Copy link

jeduden commented Jun 10, 2024

how about logging stdout + stderr of the pip install command to the log?
in general, the log output and status update of the operator should contain enough information to point the user of the controller to the cause of the error.

i have currently the same issue. I have been able to exec into the operator container. Running pip install from the commandline works and exists with error code 0.

In my case, the operator's container hasn't been restarted. There are also no memory limits set on the container.

@jeduden
Copy link

jeduden commented Jun 11, 2024

Update: I was able to troubleshoot the issue by redeploying the operator with --zap-level=debug. The stdout/stdout of the pip install execution are then printed in the logs of the operator.
It turns out the requirements.txt wasn't found.
However, I checked the local workspace on the operator's volume. The requirements.txt file is present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact/usability Something that impacts users' ability to use the product easily and intuitively kind/bug Some behavior is incorrect or out of spec
Projects
None yet
Development

No branches or pull requests

4 participants