Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman system reset speedup: Imply --time 0 or provide option for it #21874

Closed
martinpitt opened this issue Feb 29, 2024 · 3 comments · Fixed by #21906
Closed

podman system reset speedup: Imply --time 0 or provide option for it #21874

martinpitt opened this issue Feb 29, 2024 · 3 comments · Fixed by #21906
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@martinpitt
Copy link
Contributor

Feature request description

I'm currently trying to clean up cockpit-podman's test setup/cleanup, after a quick inquiry. In some ideal world, calling podman system reset --force would clean up everything. It doesn't yet, but I'd like to gradually get rid of our extra hacks.

An easy one is to make it faster:

podman run -d docker.io/busybox sleep infinity
time podman system reset --force

shows:

WARN[0010] StopSignal SIGTERM failed to stop container lucid_jones in 10 seconds, resorting to SIGKILL 

real	0m10.324s
user	0m0.134s
sys	0m0.152s

This is annoying for many tests, as it quickly piles up, is often slower than the actual test, and completely unnecessary.

Suggest potential solution

Given the hammer size and intent of system reset, it'd would make sense to me if it didn't bother with the 10s timeout, and just immediately kill everything.

Alternatively, it could grow a --time 0 option like podman rm has.

Have you considered any alternatives?

We currently do an extra podman rm --force --time 0 --all; podman pod rm --force --time 0 --all (conditionalized, as e.g. Ubuntu 22.04's podman doesn't yet understand the --time option). This works, but it's one more step to keep track of.

Additional context

This is on current Fedora 40: podman-5.0.0~rc1-3.fc40.x86_64

@martinpitt martinpitt added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 29, 2024
@Luap99
Copy link
Member

Luap99 commented Feb 29, 2024

For system reset it should just default to 0 IMO, the data is gone anyway after that so no reason to wait for a graceful container shut-down.

Anyhow you can run podman stop --all -t0 even on older ubuntu versions to work around that for now.

martinpitt added a commit to martinpitt/cockpit-podman that referenced this issue Mar 1, 2024
`podman system reset` defaults to waiting for 10s for containers to shut
down, which unnecessarily slows down tests (see
containers/podman#21874). To work around this,
force-stop all containers first with a zero timeout.

This approach also works on Ubuntu 22.04, so remove the special case.
martinpitt added a commit to martinpitt/cockpit-podman that referenced this issue Mar 1, 2024
`podman system reset` defaults to waiting for 10s for containers to shut
down, which unnecessarily slows down tests (see
containers/podman#21874). To work around this,
force-stop all containers first with a zero timeout.
martinpitt added a commit to martinpitt/cockpit-podman that referenced this issue Mar 1, 2024
The `restore_dir()` for podman's data directory is highly problematic:
This interferes with btrfs subvolumes and overlayfs mounts, and often
causes `cp` failures like

```
cp: cannot stat '/home/admin/.local/share/containers/storage/overlay/compat3876082856': No such file or directory
```

So move to `podman system reset`, and restore the test images
with `podman load` for each test.

Unfortunately `podman system reset` defaults to the 10 s wait timeout
(containers/podman#21874), so we still need
the separate `rm --time 0` hack. But conceptually that can go away once
that bug is fixed.

This approach would also be nice on the system podman side, but it is super
hard to get right there especially on CoreOS: There we simultaneously want a
thorough cleanup, but also rely on the running cockpit/ws container. It also
collides with the "force unmount everything below /var/lib/containers" hack
that we unfortunately still need for some OSes. But doing it for the user at
least solves half of the problem. The observed failures in the field
all occurred on the user directory, anyway.

Fixes cockpit-project#1591
jelly pushed a commit to cockpit-project/cockpit-podman that referenced this issue Mar 1, 2024
`podman system reset` defaults to waiting for 10s for containers to shut
down, which unnecessarily slows down tests (see
containers/podman#21874). To work around this,
force-stop all containers first with a zero timeout.
martinpitt added a commit to cockpit-project/cockpit-podman that referenced this issue Mar 1, 2024
The `restore_dir()` for podman's data directory is highly problematic:
This interferes with btrfs subvolumes and overlayfs mounts, and often
causes `cp` failures like

```
cp: cannot stat '/home/admin/.local/share/containers/storage/overlay/compat3876082856': No such file or directory
```

So move to `podman system reset`, and restore the test images
with `podman load` for each test.

Unfortunately `podman system reset` defaults to the 10 s wait timeout
(containers/podman#21874), so we still need
the separate `rm --time 0` hack. But conceptually that can go away once
that bug is fixed.

This approach would also be nice on the system podman side, but it is super
hard to get right there especially on CoreOS: There we simultaneously want a
thorough cleanup, but also rely on the running cockpit/ws container. It also
collides with the "force unmount everything below /var/lib/containers" hack
that we unfortunately still need for some OSes. But doing it for the user at
least solves half of the problem. The observed failures in the field
all occurred on the user directory, anyway.

Fixes #1591
@baude baude self-assigned this Mar 1, 2024
baude added a commit to baude/podman that referenced this issue Mar 1, 2024
when performing a system reset with containers that run somewhere where
a soft kill wont work (like sleep), containers will wait 10 seconds
before terminating with a sigkill.  But for a forceful action like
system reset, we should outright set no timeout so containers stop
quickly and are not waiting on a timeout

Fixes containers#21874

Signed-off-by: Brent Baude <bbaude@redhat.com>
@baude
Copy link
Member

baude commented Mar 1, 2024

@martinpitt thanks for the nicely dont report.

@martinpitt
Copy link
Contributor Author

Wow, thanks @baude for the fast fix!

@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label May 31, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators May 31, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/feature Categorizes issue or PR as related to a new feature. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants