Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify integrity of downloaded esbuild package #42

Closed
frerich opened this issue Jan 28, 2022 · 13 comments · Fixed by #64
Closed

Verify integrity of downloaded esbuild package #42

frerich opened this issue Jan 28, 2022 · 13 comments · Fixed by #64

Comments

@frerich
Copy link

frerich commented Jan 28, 2022

The package currently fetches a tarball from npmjs.com without verifying that the package has not been tampered with. Luckily, the npm public registry provides plenty of metadata for packages via its REST API, including a digital signature and a hashsum of the package. It seems like a good idea to make use of this information.

The chapter on Verifying the PGP signature of a package from the npm public registry explains how to go about doing that. Luckily, it doesn't seem to be a lot of effort:

  1. The public key is readily available. I suspect that the npm public registry's key pair is not regenerated very often, so it may well be acceptable to bake the public key right into the source code.
  2. Instead of constructing the URL to the tarball manually, the package query the REST API via a URL like https://registry.npmjs.org/esbuild-darwin-arm64/0.14.0. The returned document contains (among other things) the effective URL from where to download the package as well as its hash sum, the uncompressed file size and a PGP key to verify the package description.

I believe no extra dependency would be needed here (at least I hope that Erlang's :public_key and :crypto can be used for everything related to integrity checking), except for a JSON parser.

Does this seem like a worthwhile improvement? If so, I'd be happy to look into implementing this improvement!

@wojtekmach
Copy link
Collaborator

(at least I hope that Erlang's :public_key and :crypto can be used for everything related to integrity checking)

that's the thing, unfortunately that doesn't seem to be the case, per @voltone:

It should be possible in theory, but :public_key does not support PGP/GPG signatures, so it would require an implementation of RFC4880. I am not aware of a package that implements that, and doing that inside the Tzdata package might be a bit of a stretch...

(lau/tzdata#123 (comment))

@frerich
Copy link
Author

frerich commented Jan 28, 2022

It appears that PGP keys can be translated to PEM keys with just a few minor changes:

  • Adjust the BEGIN/END markers
  • Strip the initial PGP header lines
  • Strip the final checksum line

For example, a PGP key like this (the =39YB at the end is the checksum):

-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: Keybase Go 1.0.47 (darwin)
Comment: https://keybase.io/npmregistry

xsFNBFrD+I4BEAClMVISYE30bHfb9hr6odnBu+GLP5TktnkNqx9sfdRH6JkiDNUv
9QwChC4IerLtNRAsXymq+mGe+kgUweYvQhNi5DkiEWfMHMSSUCArmNl+UMLkMDaW
...
5x7slApOCUi6+BQnNnnY8YZlUqWGA52GilpQnnB4gM+9xuIpxhL7QfCKHr4=
=39YB
-----END PGP PUBLIC KEY BLOCK-----

Becomes this:

-----BEGIN RSA PUBLIC KEY-----
xsFNBFrD+I4BEAClMVISYE30bHfb9hr6odnBu+GLP5TktnkNqx9sfdRH6JkiDNUv
9QwChC4IerLtNRAsXymq+mGe+kgUweYvQhNi5DkiEWfMHMSSUCArmNl+UMLkMDaW
...
5x7slApOCUi6+BQnNnnY8YZlUqWGA52GilpQnnB4gM+9xuIpxhL7QfCKHr4=
-----END RSA PUBLIC KEY-----

Indeed, after applying those transformations to https://keybase.io/npmregistry/pgp_keys.asc I was able to load the resulting PEM key using :public_key.pem_decode/1.

@josevalim
Copy link
Member

Hi @frerich, I would definitely love a PR that implements this. :) Perhaps we should even extract this as a package called npm_package_downloader that receives a package name, version and optional pgp keys, because it could be used by both https://github.com/cargoSense/dart_sass and https://github.com/phoenixframework/tailwind.

About the npm url though, how useful would that be in practice? If there is a MITM attach, then the URL would also be changed. So I think we would need to have the pgp keys hardcoded into this repo, which is fine IMO. For the hashing, we could hardcode the hash for the versions we use by default but otherwise the user would have to list the hash?

@frerich
Copy link
Author

frerich commented Jan 28, 2022

I think hardcoding (or requiring to list) hashes would be a plausible improvement. In fact, I originally meant to suggest that, too. However, I'd love to be able to get away without this and rather rely on the hashes available via the REST API. I just need to be careful to verify that these hashes can be trusted (i.e. they haven't been tampered with, too!). :-)

It appears that the information which is relevant for making sure that we're not downloading something nasty is stored in the dist field of the JSON object. For example, here's what I get for https://registry.npmjs.org/esbuild-darwin-arm64/0.14.0:

{
  "integrity": "sha512-hMbT5YiBrFL763mnwR9BqNtq9XtJgJRxYs7Ad++KUd+ZhMoVE0Rs/YLe1oor9uBGhHLqQsZuJ2dUHjCsfT/iDg==",
  "shasum": "644efb31fb27e291465e24757b3194d36aa2eb7d",
  "tarball": "https://registry.npmjs.org/esbuild-darwin-arm64/-/esbuild-darwin-arm64-0.14.0.tgz",
  "fileCount": 3,
  "unpackedSize": 8249070,
  "npm-signature": "-----BEGIN PGP SIGNATURE-----\r\nVersion: OpenPGP.js v3.0.13\r\nComment: https://openpgpjs.org\r\n\r\nwsFcBAEBCAAQBQJhoT2OCRA9TVsSAnZWagAAhD0P/1lADzoSLVZMGgYuOJAH\n1WZ3bC54yl6Ixq8DzEck8GYRjJEHjuWVNcFb40HXl8utryLJfRRk0zdWqvLF\n2lc8oRufMc+fZkNBg3vOEOasLo0D+dcOy/2yDHIX4GSsdS+Le4UmgMXk7Ind\nVtUwPtDmpXr7VdQn96rCx3ZolrLCPJ94A2TsLIkTRnT6ThEAkWURiKeKGrDd\nCTPnJir9r18xdMiKbLimMpgdBe2P9n98EvY84quy1xBQGFfMA9YLwXJPR4dX\nvKRARJHvnfnY79RTR13VBbyiwVxiumA14MmHQGBLWxrorY//L8q3tWtiQ5by\nknpVsrDVgskd9cPkQuTA0gGtYIEUHWGyvd8nC3ZF1en3+DJanbRJV+4rhjdJ\nYRK5sN4mXLM6JO4FSrz/5lVN00LdaKTOMnB7S6gbuVCUydWMTDv7WgDE27ib\nSLpFVBco05oebszRvLk8xN3zgjgkQ6iBDvm7iBeotl4gcG/jet6GPoYHryxr\nXS5dCi3Q/U1HFTO2Fn5xmy6cTpyhw+B2hL0WrqtNm5Q3+zdD37W+02VnNsu7\nzOG+X07HqcSPqpQxHYjI2Q+kdAzeJq6wzzvzPUlMvydpObHP9qAwt3I1Rl6r\njUC3SsUjQkzwViUDLWXoFd0lMFOkFvzvgswdVlwfT1mV2sfggCvH44qFQokH\nqaO4\r\n=09zh\r\n-----END PGP SIGNATURE-----\r\n"
}

It seems useful to not only consider the hashsum but also meta information such as the number of files or the uncompressed size (to guard against ZIP bombs). However, how can we verify that this JSON payload hasn't been tampered with, too?

The instructions in the npm public registry docs explain that a string made from the package name, the package version and the integrity field (from the dist information shown above) should be constructed. Something like this:

esbuild-darwin-arm64@0.14.0:sha512-hMbT5YiBrFL763mnwR9BqNtq9XtJgJRxYs7Ad++KUd+ZhMoVE0Rs/YLe1oor9uBGhHLqQsZuJ2dUHjCsfT/iDg==

I suppose(!) that this is what people who (unlike me) actually know something about cryptography might call the 'message' to verify. This message is then meant to be verified given the signature in npm-signature and the public key available from their website.

What I don't quite get is: how does verifying this message ensure that the other fields in the dist object (e.g. the file URL, the number of files, the file size etc.) are accurate... 🤔

@wojtekmach
Copy link
Collaborator

wojtekmach commented Jan 28, 2022

In my opinion it would be enough to just see if the archive wasn't tampered with, I don't think we need to check other pieces of data like number of files. If archive was signed and we can verify it with something like :public_key.verify/4 the only thing we need is to ship with the npmjs public key. That's exactly how we do repo verification on Hex (https://github.com/hexpm/hex/blob/v1.0.1/lib/hex/repo.ex#L8).

If this works it would be absolutely fantastic to use the same solution on tzdata!

@josevalim
Copy link
Member

I just need to be careful to verify that these hashes can be trusted (i.e. they haven't been tampered with, too!). :-)

My understanding is that getting the hash from the same source as the package does not help with anything. So you either need to pass it on the side or, when you download it for the first time, you store the hash somewhere (assets/.esbuild.hash). Then you guarantee it wasn't tampered afterwards.

But i agree with @wojtekmach, the most important is the PGP signing. If we can validate the signature, then we can trust the hash, and use it to validate the package. :) The public key is hardcoded then.

@frerich
Copy link
Author

frerich commented Jan 28, 2022

A first shot at trying to use :public_key.verify/4 unfortunately failed. The issue is that I can't seem to decode the public key entry in the PEM.

I massaged the official public key into a PEM file according to the instructions I mentioned above, which gets me this: https://gist.github.com/frerich/db9a325e393c10482ca99f5ab20c95fd

Loading this PEM works (i.e. it appears to be syntactically valid), but decoding doesn't:

/tmp % iex
Erlang/OTP 24 [erts-12.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1]

Interactive Elixir (1.13.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> [entry] = File.read!("pgp_keys.pem") |> :public_key.pem_decode()
[
  {:RSAPublicKey,
   <<198, 193, 77, 4, 90, 195, 248, 142, 1, 16, 0, 165, 49, 82, 18, 96, 77, 244,
     108, 119, 219, 246, 26, 250, 161, 217, 193, 187, 225, 139, 63, 148, 228,
     182, 121, 13, 171, 31, 108, 125, 212, 71, 232, 153, 34, 12, 213, ...>>,
   :not_encrypted}
]
iex(2)> decoded_pubkey = :public_key.pem_entry_decode(entry)
** (MatchError) no match of right hand side value: {:error, {:asn1, {{:invalid_length, 6}, [{:asn1rt_nif, :decode_ber_tlv, 1, [file: 'asn1rt_nif.erl', line: 85]}, {:"OTP-PUB-KEY", :decode, 2, [file: 'OTP-PUB-KEY.erl', line: 1232]}, {:public_key, :der_decode, 2, [file: 'public_key.erl', line: 356]}, {:erl_eval, :do_apply, 6, [file: 'erl_eval.erl', line: 685]}, {:erl_eval, :expr, 5, [file: 'erl_eval.erl', line: 446]}, {:elixir, :recur_eval, 3, [file: 'src/elixir.erl', line: 289]}, {:elixir, :eval_forms, 3, [file: 'src/elixir.erl', line: 274]}, {IEx.Evaluator, :handle_eval, 3, [file: 'lib/iex/evaluator.ex', line: 310]}]}}}
    public_key.erl:360: :public_key.der_decode/2

A quick Google query revealed a StackOverflow thread which might be related. It suggests that this might be a bug in OTP (I can't tell if this is true or not), but unfortunately I couldn't find any workaround.

@voltone
Copy link
Contributor

voltone commented Jan 31, 2022

Both the public key and the signature use PGP-encoding according to RFC4880. As I said in the tzdata ticket, verifying such a signature requires decoding and verification functions to be implemented as part of the package or pulled in through a dependency...

@frerich
Copy link
Author

frerich commented Feb 1, 2022

Thanks a lot for confirming @voltone! ❤️

I wonder what the path of least resistance would be here. Is it actually required to implement the RFC in Elixir or can we maybe leverage some existing implementation (even if it just means shelling out to a command line utility)? I'd be happy to do the grunt work to get this in, but I'm not terribly familiar with the state of cryptography libraries in Elixir so I'm hesitant to make the first move. 😊

Any thoughts on this would be much appreciated!

@voltone
Copy link
Contributor

voltone commented Feb 1, 2022

Relying on external binaries is going to be very brittle.

I can prepare some code that implements the most important parts of the RFC, but I'm not sure I can commit to maintaining it as a package: my track record as a package maintainer is not great, unfortunately.

@josevalim
Copy link
Member

I think a proof of concept can be extremely helpful. You can leave it in a Gist or in an archived repo if you don’t want to maintain it. :)

@sergchernata
Copy link

Stuck on the same error as @frerich, would love to see a solution or some movement on this issue.

@grzuy
Copy link
Contributor

grzuy commented Oct 10, 2023

#64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants