Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC 0131] Enforce the usage of SRI hashes in Nixpkgs #131

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions rfcs/0131-sri-hashes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
feature: sri-hashes
start-date: 2022-07-25
author: Winter
co-authors: (find a buddy later to help out with the RFC)
shepherd-team: (names, to be nominated and accepted by RFC steering committee)
shepherd-leader: (name to be appointed by RFC steering committee)
related-issues: (will contain links to implementation PRs)
---

# Summary
[summary]: #summary

Use SRI hashes and the `hash` parameter for fetchers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"SRI hash" needs a reference link with a definition and globally this RFC needs some more context to be explained.

FWIW, since I became aware that this kind of hash was preferred, I have tried to use them more systematically, but I'm annoyed by nix-prefetch-url not generating this type of hash (at least all the way to Nix 2.12) and not knowing how to convert a hash generated by nix-prefetch-url into an SRI hash.

Copy link
Member

@vcunat vcunat Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, for conversion I'm using

nix hash to-sri --type sha256 theHash

EDIT: or you could use

nix hash file path/to/file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! But another thing that is becoming annoying to me lately is to be forced to activate the nix-command experimental feature for more and more basic things that I cannot avoid.

Another solution that I've found meanwhile was openssl dgst -sha256 -binary /path/to/file | openssl base64 -A, but it's not as convenient.

Copy link

@ShamrockLee ShamrockLee Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But another thing that is becoming annoying to me lately is to be forced to activate the nix-command experimental feature for more and more basic things that I cannot avoid.

I would rather view it as a missing feature of nix-hash that needs to be implemented to get this RFC merged. In other words, we should have

nix-hash --sri --type sha256 path/to/file

and

nix-hash --to-sri --type sha256 theHash

Update: I just created NixOS/nix#7690. Please take a look.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! And thanks to you, I also learned that, in the meantime, I can use nix --extra-experimental-features nix-command file to-sri --type sha256 .... I didn't realize that I don't need to activate the experimental feature globally, and this is already better than nothing.


# Motivation
[motivation]: #motivation

Currently in Nixpkgs, the hash formats that are used in parameters to fetchers range from SRI and base-32 (the most common ones) to hex. In addition to this, packages use the algorithm-specific arguments (`sha256`, `sha512`, etc.) or the generic `hash` argument at random. This creates inconsistencies that we'd rather not have.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is SRI in this context?

Copy link
Member Author

@winterqt winterqt Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subresource Integrity, whose hashes look like sha256-2mE26ES+fYSWdfMr8uTsX2VVGTNMDQ9MXEk5E/L95UI=. Maybe I should add at least a definition of SRI in the RFC?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates inconsistencies that we'd rather not have.

Why? This is not a motivation, just a claim.

Using the different hash encodings genuinely has value, for example I frequently just copy over the base16 hash from a SHA256SUMS file provided by upstream, requiring less button presses, round trips and most importantly allows others to quickly and easily match the hashes textually when reviewing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a link to what SRI is in the RFC would be good to have. I also didn't know what SRI was.

Copy link
Member Author

@winterqt winterqt Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the different hash encodings genuinely has value...

I agree with this point, but at the same time most people use either the SRI hashes given by Nix's hash mismatch errors (which I now realize is only in 2.4), or convert those to base-32 (the default in 2.3 and below).

Copy link
Member

@sternenseemann sternenseemann Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vojta001 That's why these interfaces tend to have an assert or two behind the scenes…

nix hash to-sri

Well that's actually a very good point: There is no stable interface for converting hashes into the SRI format, because nix-hash(1) only supports SRI as an input, not an output format.

Copy link

@milahu milahu Sep 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the different hash encodings genuinely has value, for example I frequently just copy over the base16 hash from a SHA256SUMS file provided by upstream, requiring less button presses, round trips and most importantly allows others to quickly and easily match the hashes textually when reviewing.

no need to convert the hash to base64

SRI hashes only require a prefix:

{
  sha256 = "1cc9110705165dc09dd09974dd7c0b6709c9351d6b6b1cef5a711055f891dd0f";
  sha256 = "sha256-1cc9110705165dc09dd09974dd7c0b6709c9351d6b6b1cef5a711055f891dd0f";
  hash = "sha256-1cc9110705165dc09dd09974dd7c0b6709c9351d6b6b1cef5a711055f891dd0f";
}

base16 hashes are found with
grep -r -E 'sha256 = "[0-9a-f]{64}";'

new and better hash technologies appear everywhere.

sha256 is de-facto standard in nixpkgs,
because its good enough for our use case = global IDs

Copy link
Member

@AndersonTorres AndersonTorres Sep 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sha256 is de-facto standard in nixpkgs,

MD5 was de facto standard no more than a decade ago.

Also, this was not about "good enough algorithms", but about a better format for the expression - namely, hash = "hashAlgorithm-..." instead of hashAlgorithm = "...".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates inconsistencies that we'd rather not have.

@winterqt could you add something more specific here, considering this is the whole motivation for the RFC?

The discussion above has brought up some more points:

  • semantic vs syntactical mutual exclusion
  • unique attribute name rather than multiple, algo-specific ones
  • ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@milahu: conversion would be needed because SRI is currently only accepting base64. However, it would be relatively easy to change: NixOS/nixpkgs#199660


# Detailed design
[design]: #detailed-design

1. Require all new packages to use SRI hashes + `hash` when possible
2. Do a treewide migration for the majority of cases (single fetcher in `src`, using a fetcher that supports `hash` and SRI hashes (`fetchFromGitHub`, `fetchurl`, etc.)) to `hash` + SRI hashes

# Drawbacks
[drawbacks]: #drawbacks

None at this time.
Copy link
Member

@sternenseemann sternenseemann Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's (anecdotically) still a significant number of Nix 2.3 users who contributing to nixpkgs would be made unnecessarily harder for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, another ten years of deprecation window...

Copy link
Member

@piegamesde piegamesde Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You joke, but I seriously think this would be a sensible thing to do. (Also, things can be deprecated without the intent to eventually removing them, see Java which did not remove deprecated APIs until Java 9?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall we have developed a strange culture of removing stuff unnecessarily in nixpkgs, I feel like.

In this case, we shouldn't remove support for these formats from Nix anyways (to continue to be able to evaluate older nixpkgs revisions (the ability to reproduce ancient stuff truely is a strength of Nix)) and for existing APIs in nixpkgs it's laughably simple to keep supporting it (you don't need to change anything, in fact!). For all I care, stop supporting the sha256 and sha512 attributes for new APIs…

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text does not talk about removal, I suppose, but about enforcement. Older expressions can be rendered by newer versions, whereas new expressions should be written in newer versions of Nix.

Copy link
Member Author

@winterqt winterqt Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how does something like this sound: we strongly encourage using SRI hashes and hash in new packages, if there is no justifiable reason to not use them. Justifiable reasons include, but are not limited to, taking hashes directly from upstream, and using automated tooling that does not use SRI hashes.

I'm not sure about that last reason, but I also want to provide more than one example of a reason.

I'm also not sure about how to handle converting old packages (with the same criteria of having no justifiable reasoning), if at all. Thoughts welcome.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old packages can be converted by anyone interested ion converting them. I have started a preliminar work a long time ago :P

NixOS/nixpkgs#176283

Further I believe this should be decided on a case by case basis. Maybe allowing old Nix tools is acceptable, whereas upstream hashes is not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warns the user when the code uses dangerous functions

Not sure how apt this comparison is, as there's nothing fundamentally different about a sha256 hash in SRI format or one in base16 format.

I'm open to changing it to "discourage", but... I don't know, if non-SRI hashes keep slipping in anyways because it's not required, what's the point of this? I guess that's why I chose that strong of a word.

Stale hashes is not something caused by a particular hash format. It happens because expressions bitrot and we'll have stale hashes in SRI format before long. Doing a mass conversion of hashes may turn up stale hashes, but doesn't have to since you would likely just convert them without refetching, since that is much quicker.

If you are concerned about stale hashes, I'd suggest trying to refetch fixed output derivations in nixpkgs with cache disabled. I believe this has been done quite a few times over the years and it should be easy to turn up useful tools for this.

strongly encourage

Newer Nix versions already default to this, so there is already a lot of encouragement…

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"A lot of encouragement" that ends with "my version is too old" or "this is mere personal preferences, don't force them on me"? Wow cool.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version is probably too old for the format to be convenient to produce but can evaluate the expressions with either hash format. For «mere personal preferences» argument new Nix making use of SRI format easier than non-SRI is an encouragement, as submitters will typically use whatever is the easiest and changing the format is pointless churn so the initial hash format will stay.


# Alternatives
[alternatives]: #alternatives

- Do nothing: further inconsistencies will continue to be introduced
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a number of alternatives.

Different rules can be considered. For each of the following formats it could be decided to allow/disallow/preferred:

  • hash = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";
  • sha256 = "0000000000000000000000000000000000000000000000000000000000000000";
  • sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";
  • sha256 = "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";

I've gathered there are a number of ways these rules can be 'enforced':

  • Throw evaluation errors upon not conforming to rule.
  • Have a it be a potential future linter rule.
  • Have PR reviewers nudge contributors to follow the rule
  • Do nothing: keep things like the are right now. Reviewers will suggest their own opinion in PRs if they'd like.
  • Allow any format: do not allow PR reviewers to suggest changing the format in PRs to their preferred format. If the format is allowed, just let it be. This avoids PRs being always potentially invalid due to reviewers having different opinions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allow anything: Do not allow PR reviewers to suggest changing the format in PRs. If the format is allowed, just let it be. Avoid conflicting reviewer opinions.

I'm mentioning this option, as at the moment PRs are being reviewed where the reviewers suggest their own 'opinion' on the matter; even though there is no real rule. One outcome of this RFC could be to just let it be.


# Unresolved questions
[unresolved]: #unresolved-questions

- How can we do the treewide migration efficently?
- Text replacement for the majority of cases, with an AST-based solution for others?
- Do we want to eventually deprecate the use of `sha256`, `sha512`, etc. in fetchers that support SRI hashes, leading to their eventual removal?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can deprecate them without removing them.

- This would require updating every script that uses these arguments, and would cause breakage for any usage of them out-of-tree; we'd need to weigh the benefits of this
- Is `hash` the correct name to use for the argument?
- Nix builtins use `narHash`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care about the name, but built-in fetchers and nixpkgs fetchers should be consistent.

Comment on lines +44 to +45
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see much value in renaming hash. A lot of work which we can safe ourselves.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See NixOS/nixpkgs#79987 (comment) for context, I added it here just to include it since the question was raised (should probably cite that discussion, though).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are not talking about narHashes here, so this should be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAR hashes are necessary for any cases where the result is more than a single file, e.g. fetchFromGitHub.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, as we have flat hashes for normal files and NAR hashes over arbitrary trees (or normal files), I'd say it makes sense to call them differently. At least if I don't consider any backwards compatibility questions for now. For example, keep hash for normal ones and rename NAR hashes to narHash. It might even avoid some confusion, and functions could allow specifying either (though of course the value differs, as something different is hashed).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that could be done under the hood in the fetchers? It seems confusing for new contributors to have to think about whether to use a flat hash or a NAR hash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We already do this with outputHashMode.


# Future work
[future]: #future-work

Currently none.