Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: add support for cgroup namespace #397

Merged
merged 2 commits into from
Jun 3, 2016

Conversation

cyphar
Copy link
Member

@cyphar cyphar commented Apr 25, 2016

The cgroup namespace is a new kernel feature available in 4.6+ that
allows a container to isolate its cgroup hierarchy. This currently only
allows you to hide information from /proc/self/cgroup. But I'm currently
working with upstream on expanding it so that you can modify your
hierarchy inside a user namespace (even a rootless container).

Signed-off-by: Aleksa Sarai asarai@suse.de

@cyphar cyphar mentioned this pull request Apr 25, 2016
46 tasks
@vbatts
Copy link
Member

vbatts commented Apr 25, 2016

seems fine. Is this needing to wait on upstream linux or runc?

@cyphar
Copy link
Member Author

cyphar commented Apr 25, 2016

This has been merged into Linux and will be released in 4.6. I'd still recommend waiting on the release. The great thing about this API is that its compile-time switch is CONFIG_CGROUPS, so if you have a >=4.6 kernel with cgroups you automatically get this namespace.

@vbatts
Copy link
Member

vbatts commented Apr 25, 2016

oh nice. That smooth the transition for sure.

On Mon, Apr 25, 2016 at 11:35 AM, Aleksa Sarai notifications@github.com
wrote:

This has been merged into Linux and will be released in 4.6. I'd still
recommend waiting on the release. The great thing about this API is that
its compile-time switch is CONFIG_CGROUPS, so if you have a >=4.6 kernel
with cgroups you automatically get this namespace.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#397 (comment)

@mrunalp
Copy link
Contributor

mrunalp commented Apr 25, 2016

👍 We can merge this and add it to runc once released.

@wking
Copy link
Contributor

wking commented Apr 25, 2016

On Mon, Apr 25, 2016 at 08:14:24AM -0700, Aleksa Sarai wrote:

M schema/defs-linux.json (3)
M specs-go/config.go (2)

We also want docs in config-linux.md#namespaces, and possibly an
addition to the example in that section and the full example in
config.md.

@cyphar
Copy link
Member Author

cyphar commented Apr 25, 2016

@wking I've added examples and explanations in the docs.

@@ -33,6 +33,7 @@ The following parameters can be specified to setup namespaces:
* **`ipc`** processes inside the container will only be able to communicate to other processes inside the same container via system level IPC
* **`uts`** the container will be able to have its own hostname and domain name
* **`user`** the container will be able to remap user and group IDs from the host to local users and groups within the container
* **`cgroup`** the container will have an isolated cgroup hierarchy that it can manage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference 4.6 as the release that landed this feature?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@vishh
Copy link
Contributor

vishh commented Apr 25, 2016

How do we handle incompatibility with new kernel features as of now? Is the Spec attempting to provide some means of discovery of the features that a given kernel supports?

@mrunalp
Copy link
Contributor

mrunalp commented Apr 25, 2016

@vishh I think probably leave that to the runtime.

@vishh
Copy link
Contributor

vishh commented Apr 25, 2016

@mrunalp: That would affect client portability right? Is cross-runtime
portability a goal?

On Mon, Apr 25, 2016 at 2:30 PM, Mrunal Patel notifications@github.com
wrote:

@vishh https://github.com/vishh I think probably leave that to the
runtime.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#397 (comment)

@cyphar
Copy link
Member Author

cyphar commented Apr 25, 2016

@vishh We can make it clear in the spec that a runtime implementation must fail if it can't create all of the requested namespaces. Anything outside of that should be left to the runtime to handle (such as figuring out whether or not it can create the namespaces).

@wking
Copy link
Contributor

wking commented Apr 25, 2016

9926619 looks good to me. I don't mind whether the
fail-if-not-supported language 1 lands via this PR or a separate PR.

* **`ipc`** processes inside the container will only be able to communicate to other processes inside the same container via system level IPC. Support for this was added in Linux 2.6.19.
* **`uts`** the container will be able to have its own hostname and domain name. Support for this was added in Linux 2.6.19.
* **`user`** the container will be able to remap user and group IDs from the host to local users and groups within the container. Support for this was added in Linux 3.8.
* **`cgroup`** the container will have an isolated cgroup hierarchy that it can manage. Support for this was added in Linux 4.6.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we modify the text here - that it sees an isolated/virtualized view of the cgroup heirarchy? The manage portion doesn't quite work well yet ;)

We can use text from here https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d4021f6cd41f03017f831b3d40b0067bed54893d

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, it was a bit ambitious for a description. ;)

@cyphar
Copy link
Member Author

cyphar commented May 19, 2016

@mrunalp Updated. I've also added the "must fail" wording as @wking requested.


* **`path`** *(string, optional)* - path to namespace file in the [runtime mount namespace](glossary.md#runtime-namespace)

If a path is specified, that particular file is used to join that type of namespace.
Also, when a path is specified, a runtime MUST assume that the setup for that particular namespace has already been done and error out if the config specifies anything else related to that namespace.

If the runtime is unable to create or join all of the requested namespaces, it
MUST fail so as not to lull the user into a false sense of security.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can drop 'so as not to lull the user into a false sense of security'. Otherwise looks fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix'd.

@mrunalp
Copy link
Contributor

mrunalp commented May 19, 2016

LGTM


* **`path`** *(string, optional)* - path to namespace file in the [runtime mount namespace](glossary.md#runtime-namespace)

If a path is specified, that particular file is used to join that type of namespace.
Also, when a path is specified, a runtime MUST assume that the setup for that particular namespace has already been done and error out if the config specifies anything else related to that namespace.

If the runtime is unable to create or join ALL of the requested namespaces, it
MUST fail.
Copy link
Contributor

@wking wking May 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency withruntime.md`, we should probably use “generate an error”. Although @vishh's concern was broader than namespaces, so I think we might want to put this language in step 2 of the lifecycle.

Also, one line per sentence ;).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, one line per sentence ;).

Eww. 80 characters is objectively the right wrapping. ;)
But yes, I'll update the lifecycle documentation then.

@mrunalp mrunalp added this to the 1.0.0 milestone May 25, 2016
@cyphar
Copy link
Member Author

cyphar commented May 27, 2016

@wking @mrunalp I've updated things as requested. The documentation is now accurate about what the cgroup namespace does -- and I've added an extra line to the lifecycle documentation to make clear that a runtime must generate an error if it can't set up the environment requested by a container.

@wking
Copy link
Contributor

wking commented May 27, 2016

On Fri, May 27, 2016 at 05:02:18AM -0700, Aleksa Sarai wrote:

@wking @mrunalp I've updated things as requested.

Heh, f0e4d33a021e63 becomes much more excited about isolation (from
“only really hides” to “invaluable for hiding” ;).

Anyhow, the tip commit (058860b) looks good (and can be cherry-picked
into a new PR and landed now as far in my view, although I'm not a
maintainer).

I'm not sure about the penultimate commit; will leave an inline comment.


* **`path`** *(string, optional)* - path to namespace file in the [runtime mount namespace](glossary.md#runtime-namespace)

If a path is specified, that particular file is used to join that type of namespace.
Also, when a path is specified, a runtime MUST assume that the setup for that particular namespace has already been done and error out if the config specifies anything else related to that namespace.
Copy link
Contributor

@wking wking May 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is new with your recent reroll, and rolls back #158. What has changed since #158? The motivation for this line was “we may support join-and-tweak if someone gives a convincing use-case, but we're banning it for now”, but you're removing it (in a drive-by change?) while adding a cgroup namespace, and I don't see the connection.

Copy link
Member Author

@cyphar cyphar May 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Urgh, looks like it was a dodgy rebase. I'll fix this up.

EDIT: Fixed.

@@ -35,6 +35,7 @@ The lifecycle describes the timeline of events that happen from when a container
1. OCI compliant runtime's `create` command is invoked with a reference to the location of the bundle and a unique identifier.
How these references are passed to the runtime is an implementation detail.
2. The container's runtime environment MUST be created according to the configuration in [`config.json`](config.md).
If the runtime is unable to create the runtimme environment specified in the [`config.json`](config.md) it MUST generate an error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“runtimme” → “runtime”.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe just drop that second “runtime” instances and say “… the environment specified…”.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you probably want a comma after config.json: “If … config.json, it MUST generate an error.”

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. And I fixed up the email madness.

@wking
Copy link
Contributor

wking commented May 28, 2016 via email

@wking
Copy link
Contributor

wking commented May 29, 2016 via email

* **`ipc`** processes inside the container will only be able to communicate to other processes inside the same container via system level IPC
* **`uts`** the container will be able to have its own hostname and domain name
* **`user`** the container will be able to remap user and group IDs from the host to local users and groups within the container
* **`pid`** processes inside the container will only be able to see other processes inside the same container. Support for this was added in Linux 2.6.24.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One sentence per line. Otherwise looks good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn, I swore I fixed this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if you can do this with lists in md @mrunalp

@tianon
Copy link
Member

tianon commented Jun 1, 2016

As discussed in the call, my $0.02 on whether to include kernel versions of required features in the spec, I don't think it's really strictly necessary for the spec -- having a separate "cheat sheet" might make sense (mapping features to minimum kernel versions), but if the information is reasonably readily available elsewhere, then I think it's sufficient for the tools to simply error out when unsupported features are used (relying more on the runtime check than fuzzy kernel version matching, which RedHat has a habit of throwing a wrench in 😄).

@crosbymichael
Copy link
Member

Yes, we are not recording the history of the linux kernel in this spec.

@mrunalp
Copy link
Contributor

mrunalp commented Jun 2, 2016

@cyphar could you update the PR to address the comments? We kinda need this for 1.0 :)
Thanks!

@crosbymichael
Copy link
Member

I wouldn't say that this is a blocker for 1.0. Its adding an option and does not affect schema or anything so it can come at any time.

@mrunalp
Copy link
Contributor

mrunalp commented Jun 2, 2016

Yeah, not a hard blocker but will be nice to have since it is so close to being merged.

@cyphar
Copy link
Member Author

cyphar commented Jun 2, 2016

So you want me to remove all of the "This is available since Linux X.Y.Z" sentences?

@mrunalp
Copy link
Contributor

mrunalp commented Jun 3, 2016

@cyphar Yes. That's what we discussed and agreed to in the OCI call yesterday.

@philips
Copy link
Contributor

philips commented Jun 3, 2016

LGTM

Agreed on removing the kernel versions. It is meaningless given some Linux distros do tons of backporting.

Approved with PullApprove

@cyphar
Copy link
Member Author

cyphar commented Jun 3, 2016

@opencontainers/runtime-spec-maintainers There, I've removed the kernel versions.

@mrunalp
Copy link
Contributor

mrunalp commented Jun 3, 2016

LGTM

Approved with PullApprove

The cgroup namespace is a new kernel feature available in 4.6+ that
allows a container to isolate its cgroup hierarchy. This currently only
allows for hiding information from /proc/self/cgroup, and mounting
cgroupfs as an unprivileged user. In the future, this namespace may
allow for subtree management by a container.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
Make it clear that if a runtime cannot set up an environment that
*precisely* matches the config.json provided, it must generate an error.
This is important because not doing this can cause security issues.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
@crosbymichael
Copy link
Member

crosbymichael commented Jun 3, 2016

LGTM

Approved with PullApprove

@crosbymichael
Copy link
Member

@mrunalp @philips you have to LGTM again because the commits were updated

@vbatts
Copy link
Member

vbatts commented Jun 3, 2016

LGTM

Approved with PullApprove

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants