Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.1] runc run: fix mount leak #4425

Merged

Conversation

kolyshkin
Copy link
Contributor

This is a backport of #4417 to release-1.1.


When preparing to mount container root, we need to make its parent mount private (i.e. disable propagation), otherwise the new in-container mounts are leaked to the host.

To find a parent mount, we use to read mountinfo and find the longest entry which can be a parent of the container root directory.

Unfortunately, due to kernel bug in all Linux kernels older than v5.8 (see 1, 2), sometimes mountinfo can't be read in its entirety. In this case, getParentMount may occasionally return a wrong parent mount.

As a result, we do not change the mount propagation to private, and container mounts are leaked.

Alas, we can not fix the kernel, and reading mountinfo a few times to ensure its consistency (like it's done in, say, Kubernetes) does not look like a good solution for performance reasons.

Fortunately, we don't need mountinfo. Let's just traverse the directory tree, trying to remount it private until we find a mount point (any error other than EINVAL means we just found it).

Fixes issue 2404.

Signed-off-by: Kir Kolyshkin kolyshkin@gmail.com
(cherry picked from commit 13a6f56)
Signed-off-by: Kir Kolyshkin kolyshkin@gmail.com

@kolyshkin kolyshkin added this to the 1.1.15 milestone Oct 3, 2024
When preparing to mount container root, we need to make its parent mount
private (i.e. disable propagation), otherwise the new in-container
mounts are leaked to the host.

To find a parent mount, we use to read mountinfo and find the longest
entry which can be a parent of the container root directory.

Unfortunately, due to kernel bug in all Linux kernels older than v5.8
(see [1], [2]), sometimes mountinfo can't be read in its entirety. In
this case, getParentMount may occasionally return a wrong parent mount.

As a result, we do not change the mount propagation to private, and
container mounts are leaked.

Alas, we can not fix the kernel, and reading mountinfo a few times to
ensure its consistency (like it's done in, say, Kubernetes) does not
look like a good solution for performance reasons.

Fortunately, we don't need mountinfo. Let's just traverse the directory
tree, trying to remount it private until we find a mount point (any
error other than EINVAL means we just found it).

Fixes issue 2404.

[1]: https://github.com/kolyshkin/procfs-test
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9f6c61f96f2d97cbb5f
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit 13a6f56)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin kolyshkin marked this pull request as ready for review October 3, 2024 17:55
@kolyshkin kolyshkin requested review from rata and cyphar October 3, 2024 17:58
@kolyshkin kolyshkin mentioned this pull request Oct 3, 2024
@AkihiroSuda AkihiroSuda merged commit ed38aea into opencontainers:release-1.1 Oct 4, 2024
28 checks passed
@kolyshkin kolyshkin added the backport/1.1-pr A backport to 1.1.x release. label Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.1-pr A backport to 1.1.x release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants