Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xcp exhausts memory, probably avoidably #769

Open
colemickens opened this issue Sep 13, 2024 · 5 comments
Open

xcp exhausts memory, probably avoidably #769

colemickens opened this issue Sep 13, 2024 · 5 comments
Labels
help wanted Extra attention is needed

Comments

@colemickens
Copy link
Member

  1. I have to bump memsize to build a disk image with a moderate (but not huge) closure.
  2. I am surprised I don't hit cmdline arg length limits.
  3. I think that using something other than xcp with ALL store paths at once, would avoid needing a larger guest just to install the closure.
@Mic92
Copy link
Member

Mic92 commented Sep 20, 2024

disko-images uses cp not xcp.

@Mic92
Copy link
Member

Mic92 commented Sep 20, 2024

Did you maybe use an older version of disko? Because we are using xargs + cp:

xargs cp --recursive --target ${systemToInstall.config.disko.rootMountPoint}/nix/store < ${closureInfo}/store-paths

@colemickens
Copy link
Member Author

I think I just crossed wires. I am definitely using recent disko. However, I'm still suspicious that we can somehow batch/chunk around xargs cp.

It's very apparent that this copy step uses a LOT of memory, it's where my image build fails every time with a large enough closure and not enough VM ram.

Obviously it's easy to just bump the builder ram, but again, I'm guessing that this can be massaged to not use so much RAM for the copy.

@iFreilicht
Copy link
Contributor

Hmm, I'm not sure why this happens. I can definitely confirm that:

  1. running nom build .#checks.x86_64-linux.make-disk-image causes the qemu process to use up to 2.1GB of RAM
  2. before the line nixos-disko-images> ++ xargs cp --recursive --target /mnt/nix/store, the RAM usage was less than 1GB

However, I also observed:

  1. After xargs cp is done, the RAM usage of qemu does not decrease

Additionally, I tried to reproduce this by running this copy operation on the exact same closure-info locally:

$ cat ./xargs-cp.sh
#!/usr/bin/env bash
xargs cp --recursive --target /mnt/scratch/test-store < "$1/store-paths"
$ nix run nixpkgs#time -- -v ./xargs-cp.sh /nix/store/a3s32wbdg5yain492c3gq8fbv9aak6vd-closure-info
        Command being timed: "./xargs-cp.sh /nix/store/a3s32wbdg5yain492c3gq8fbv9aak6vd-closure-info"
        User time (seconds): 0.25
        System time (seconds): 2.14
        Percent of CPU this job got: 56%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.25
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 3712
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 142739
        Voluntary context switches: 17875
        Involuntary context switches: 178
        Swaps: 0
        File system inputs: 851304
        File system outputs: 1552184
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

You can see Maximum resident set size (kbytes): 3712, meaning the peak memory usage of this operation was 3.7MB.

I think the issue is that we're using a tmpfs for storage of the VM, which resides in memory. We run the vmTools.runInLinuxVM function, which says

By default, there is no disk image; the root filesystem is a tmpfs, and the Nix store is shared with the host (via the 9P protocol). Thus, any pure Nix derivation should run unmodified.

As the command copies to ${systemToInstall.config.disko.rootMountPoint}/nix/store, it copies to a memory-backed virtual file.

The solution would be to pass a diskImage argument. We don't have that implemented right now, but basically it would be the same as memSize.

However, I'm not entirely sure that's what you're complaining about. If setting memSize fixes your issue, that means the memory is exhausted inside the VM itself, not on the builder.

  1. I am surprised I don't hit cmdline arg length limits.

That's to be expected, xargs is desigend to work around this limitation. From xarg's manpage:

xargs reads items from the standard input, delimited by blanks (which can be protected with double or single quotes or a backslash) or newlines, and executes the command (default is echo) one or more times with any initial-arguments followed by items read from standard input. Blank lines on the standard input are ignored.

The command line for command is built up until it reaches a system-defined limit (unless the -n and -L options are used). The specified command will be invoked as many times as necessary to use up the list of input items. In general, there will be many fewer invocations of command than there were items in the input.

@Mic92
Copy link
Member

Mic92 commented Sep 21, 2024

Those disks should be actually not in tmpfs of the virtual machine, because we add them from the build directory via virtio-blk as block devices:

"-drive file=${disk.name}.${imageFormat},if=virtio,cache=unsafe,werror=report,format=${imageFormat}"

I did try debug kernel to see where the memory usage is coming from but it is still not super clear to me. For zfs it seemed to be zfs internal memory allocations that added up.

@iFreilicht iFreilicht added the help wanted Extra attention is needed label Sep 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants