Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor generation deletion for better code reuse between CLI tools #9894

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

MichaelOultram
Copy link

@MichaelOultram MichaelOultram commented Jan 31, 2024

Motivation

The nix-env command allows the user to specify the number of generations to keep using the --delete-generations flag by prefixing the argument with a +. This functionality was missing from the nix-collect-garbage --delete-older-than flag.

As pointed out by @Ericson2314, simply adding this functionality to nix-collect-garbage would not be the right approach, and instead the logic should be shared between CLI tools.

Context

From the nix-collect-garbage docs for the --delete-older-than flag mention looking at the nix-env docs to learn more about the <period> argument.

- <span id="opt-delete-older-than">[`--delete-older-than`](#opt-delete-older-than)</span> *period*\
Delete all generations of profiles older than the specified amount (except for the generations that were active at that point in time).
*period* is a value such as `30d`, which would mean 30 days.
This is the equivalent of invoking [`nix-env --delete-generations <period>`](@docroot@/command-ref/nix-env/delete-generations.md#generations-time) on each found profile.
See the documentation of that command for additional information about the *period* argument.

In the nix-env docs, it mentions that + can be used to specify the number of generations to keep.

- <span id="generations-count">`+<number>`</span>:\
The last *number* generations up to the present
*Example*: `+5`
Keep the last *number* generations, along with any newer than current.

However, when I tried to use the nix-collect-garbage in this way, I got an error as the functionality was not implemented.

Jan 31 20:17:24 nixos-zephyrus nix-gc-start[18575]: removing old generations of profile /nix/var/nix/profiles/system
Jan 31 20:17:24 nixos-zephyrus nix-gc-start[18575]: error: invalid number of days specifier '+15', expected something like '14d'
Jan 31 20:17:24 nixos-zephyrus nix-gc-start[18575]: Try '/nix/store/j7nl2pj606d8ld5818hw3z3fbz00sdc5-nix-2.18.1/bin/nix-collect-garbage --help' for more information.

Priorities and Process

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@Ericson2314
Copy link
Member

Ericson2314 commented Jan 31, 2024

Making the interface actually be in sync is a fine idea, but adding more nix-collect-garbage-specific code is not the right way to do this. We should have a deduplicated implementation for both commands that forces them to stay in sink.

Can you figure out why the commands supported different things in the first place? Answering that question will in turn tell use what code needs to change.

@MichaelOultram
Copy link
Author

@Ericson2314 yes I agree. That would be the better way to do this. I'm not much of a C++ programmer, and it's my first PR to this repo, so I went for the smaller diff.

I'll mark this PR as a draft, and look into a better way to refactor this.

@MichaelOultram MichaelOultram marked this pull request as draft January 31, 2024 21:00
@MichaelOultram
Copy link
Author

MichaelOultram commented Feb 1, 2024

in dcc433d, the --delete-generations flag was added to nix-env. This would accept old to delete all generations except the current one, or a list of specific generations to delete. At this point, nix-collect-garbage could not delete old generations.

In 034b6f6, the --delete-older-than option to nix-collect-garbage. This would make a system call to nix-env to actually remove the generations.

In 4ca5a9d, nix-collect-garbage was changed to avoid calling nix-env. At this point, both tools would accept either the number of days format, or old which would delete all except the current generation.

In 429154b, support for specifying the number of generations to keep was added to nix-env, but not for nix-collect-garbage.

In 4b738fc, nix profile got new functionality: wipe-history. This version only accepts the number of days format, or no argument which would delete all except the current generation.

So my proposal is to add a new function into https://github.com/NixOS/nix/blob/master/src/libstore/profiles.cc which would accept the string parameter, and call the relevant deleteXXX function. Then all three different tools would call this new function instead.

@github-actions github-actions bot added documentation new-cli Relating to the "nix" command labels Feb 5, 2024
@MichaelOultram MichaelOultram changed the title nix-collect-garbage: keep last N number of generations Refactor generation deletion for better code reuse between CLI tools Feb 5, 2024
@Ericson2314
Copy link
Member

Sorry I missed this @MichaelOultram, great job with the deep dive there!

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-02-28-nix-team-meeting-129/40499/1

@MichaelOultram
Copy link
Author

Current status: I believe I've fixed the bug that the first CI run picked up on. I'm leaving this as a draft PR as I'd like to spend a little more time looking if any tests can be added.

@fricklerhandwerk
Copy link
Contributor

Code looks pretty good to me. While this is an old and deep-rooted feature surface, testing it reliably is probably at least as much work as coming up with the change. To avoid this change getting stale, I'd prefer merging it as is, and betting on @MichaelOultram to catch up with tests, and betting on the upcoming STF grant as a fallback. @Ericson2314 what do you think?

@MichaelOultram
Copy link
Author

I wrote that back in February... oops 😅. Sorry for the delay.

I've rebased the branch to the latest master, and the existing tests still pass locally. I still have ambitions to add extra tests, but I've not been able to achieve this today.

I'll mark this PR as ready if it makes it easier to merge these changes as is. I'll either continue to add the tests to this PR, or open an another PR specifically for the extra tests if this one gets merged. Either option is good with me.

@MichaelOultram MichaelOultram marked this pull request as ready for review June 11, 2024 20:28
@MichaelOultram
Copy link
Author

Looks like I managed to push during a GitHub actions incident: https://www.githubstatus.com/incidents/lfrlwdg67fn8.

Can you press the retry buttons as I don't think I have permission to?

@github-actions github-actions bot added the with-tests Issues related to testing. PRs with tests have some priority label Jun 12, 2024
@MichaelOultram
Copy link
Author

MichaelOultram commented Jun 15, 2024

EDIT: the macos CI job was rerun and it completed in 23m 57s so I'm assuming the timeout issue on the previous run was just a slow worker

Original Comment

I'm still struggling to figure out why the CI is failing on this PR. The CI / tests (macos-latest) run seems to have hit the timeout of 60 minutes. A passing run completes this in 32m 24s

First irregularity I found with my run a 9m 28s delay between these lines. The passing run does this in 4 seconds.

Wed, 12 Jun 2024 06:35:17 GMT checking derivation packages.aarch64-darwin.nix-x86_64-w64-mingw32...
Wed, 12 Jun 2024 06:44:44 GMT derivation evaluated to /nix/store/4ni1xk9sf8z2skf2shqf944wx77yr199-nix-x86_64-w64-mingw32-2.23.0pre20240611_fd6e133.drv

But this stall alone wouldn't be enough to hit the timeout. I couldn't spot any other long pauses in the logs, but it just feels like everything is running slower in my PR run for some unknown reason.

For example, if we only look at the lines with nix-tests> in them

my PR run:

Wed, 12 Jun 2024 07:06:32 GMT nix-tests> Running phase: unpackPhase
...
Wed, 12 Jun 2024 07:13:01 GMT nix-tests> ran test tests/functional/ca/concurrent-builds.sh... [SKIP]

passing run:

Tue, 11 Jun 2024 16:21:40 GMT nix-tests> Running phase: unpackPhase
...
Tue, 11 Jun 2024 16:24:44 GMT nix-tests> ran test tests/functional/ca/concurrent-builds.sh... [SKIP]
...
Tue, 11 Jun 2024 16:25:03 GMT nix-tests> installCheckPhase completed in 3 minutes 12 seconds

So my PR took 6m 29s from the start of the nix-tests> until it hit the timeout. The passing run got to this point in 3m 4s, and completed everything in 3m 23s.

Either I've been super unlucky with the CI and ended up on a worker that has approximately half the performance, or my changes have somehow regressed the performance significantly.

@MichaelOultram
Copy link
Author

I'm quite happy now about the test coverage of this changeset for specifically the nix-collect-garbage command.

I'd like to add similar tests for nix-env --delete-generations <n> and nix profile wipe-history --older-than <n> which I'll continue working on in the background.

@MichaelOultram
Copy link
Author

I believe the tests I've written suitably cover the changes I've made. I've got no more planned commits for this PR and so await feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation new-cli Relating to the "nix" command with-tests Issues related to testing. PRs with tests have some priority
Projects
Status: 🏁 Review
Development

Successfully merging this pull request may close these issues.

4 participants