Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oci_tarball does not produce consistent tars for the same inputs #328

Closed
chancila opened this issue Aug 11, 2023 · 15 comments
Closed

oci_tarball does not produce consistent tars for the same inputs #328

chancila opened this issue Aug 11, 2023 · 15 comments
Labels
bug Something isn't working
Milestone

Comments

@chancila
Copy link

tarballs generated with oci_tarball keep username and unstable timestamps making oci_tarballs not consistent

tar --list --file tarball.tar -v
-rw-r--r--  0 chancila staff    1182 Aug 10 18:27 manifest.json
drwxr-xr-x  0 chancila staff       0 Aug 10 18:27 blobs/
drwxr-xr-x  0 chancila staff       0 Aug 10 18:27 blobs/sha256/
-r-xr-xr-x  0 chancila staff  822203 Aug 10 18:27 blobs/sha256/8a79529e357e51ee5f147b1d870d38b60e48a920532fc2b8185da87f52bb5d2c.tar.gz
-r-xr-xr-x  0 chancila staff   21202 Aug 10 18:27 blobs/sha256/fe5ca62666f04366c8e7f605aa82997d71320183e99962fa76b3209fdfbb8b58.tar.gz
-r-xr-xr-x  0 chancila staff  716491 Aug 10 18:27 blobs/sha256/b02a7525f878e61fc1ef8a7405a2cc17f866e8de222c1c98fd6681aff6e509db.tar.gz
-r-xr-xr-x  0 chancila staff  103732 Aug 10 18:27 blobs/sha256/0b41f743fd4d78cb50ba86dd3b951b51458744109e1f5063a76bc5a792c3d8e7.tar.gz
-r-xr-xr-x  0 chancila staff     355 Aug 10 18:27 blobs/sha256/7c881f9ab25e0d86562a123b5fb56aebf8aa0ddd7d48ef602faf8d1e7cf43d8c.tar.gz
-r-xr-xr-x  0 chancila staff     385 Aug 10 18:27 blobs/sha256/4aa0ea1413d37a58615488592a0b827ea4b2e48fa5a77cf707d0e35f025e613f.tar.gz
-r-xr-xr-x  0 chancila staff     113 Aug 10 18:27 blobs/sha256/1e3d9b7d145208fa8fa3ee1c9612d0adaac7255f1bbc9ddea7e461e0b317805c.tar.gz
-r-xr-xr-x  0 chancila staff 18771270 Aug 10 18:27 blobs/sha256/b39c866052b17eaa1e92cee24f2bcf63f1d9bc8ec33fa438ec5c61e95b7b32e3.tar.gz
-r-xr-xr-x  0 chancila staff      198 Aug 10 18:27 blobs/sha256/e8c73c638ae9ec5ad70c49df7e484040d889cca6b4a9af056579c3d058ea93f0.tar.gz
-r-xr-xr-x  0 chancila staff     1555 Aug 10 18:27 blobs/sha256/174505bbdbdf6e2e0950a081a0e64e98e7b62ddb0904c4160b11f01ebd37dce4
-r-xr-xr-x  0 chancila staff   130562 Aug 10 18:27 blobs/sha256/5627a970d25e752d971a501ec7e35d0d6fdcd4a3ce9e958715a686853024794a.tar.gz
-r-xr-xr-x  0 chancila staff      317 Aug 10 18:27 blobs/sha256/fcb6f6d2c9986d9cd6a2ea3cc2936e5fc613e09f1af9042329011e43057f3265.tar.gz
@chancila
Copy link
Author

if its okay to use rules_pkg I can probably work on fixing this up

@chancila
Copy link
Author

as a hack I worked around this by using rules_pkg/pkg_tar by creating an intermediate rule that extracts the content of oci_tarball to a tree artifact, and using pkg_tar with strip_prefix to re-tar the content

tar --list --file  tarball.tar -v
drwxr-xr-x  0 0      0           0 Dec 31  1999 blobs/
drwxr-xr-x  0 0      0           0 Dec 31  1999 blobs/sha256/
-r-xr-xr-x  0 0      0      103732 Dec 31  1999 blobs/sha256/0b41f743fd4d78cb50ba86dd3b951b51458744109e1f5063a76bc5a792c3d8e7.tar.gz
-r-xr-xr-x  0 0      0         113 Dec 31  1999 blobs/sha256/1e3d9b7d145208fa8fa3ee1c9612d0adaac7255f1bbc9ddea7e461e0b317805c.tar.gz
-r-xr-xr-x  0 0      0         385 Dec 31  1999 blobs/sha256/4aa0ea1413d37a58615488592a0b827ea4b2e48fa5a77cf707d0e35f025e613f.tar.gz
-r-xr-xr-x  0 0      0      130562 Dec 31  1999 blobs/sha256/5627a970d25e752d971a501ec7e35d0d6fdcd4a3ce9e958715a686853024794a.tar.gz
-r-xr-xr-x  0 0      0         355 Dec 31  1999 blobs/sha256/7c881f9ab25e0d86562a123b5fb56aebf8aa0ddd7d48ef602faf8d1e7cf43d8c.tar.gz
-r-xr-xr-x  0 0      0    18771272 Dec 31  1999 blobs/sha256/831e1d1d3ee264fb9ece919f2b965656524d4b881264bd6fab63db92264581e9.tar.gz
-r-xr-xr-x  0 0      0      822203 Dec 31  1999 blobs/sha256/8a79529e357e51ee5f147b1d870d38b60e48a920532fc2b8185da87f52bb5d2c.tar.gz
-r-xr-xr-x  0 0      0      716491 Dec 31  1999 blobs/sha256/b02a7525f878e61fc1ef8a7405a2cc17f866e8de222c1c98fd6681aff6e509db.tar.gz
-r-xr-xr-x  0 0      0        1555 Dec 31  1999 blobs/sha256/c561fe071ca1c5f3c2a13186e7b9a09c7f9b3694f7e5e4dca54d9d1e7af36791
-r-xr-xr-x  0 0      0         198 Dec 31  1999 blobs/sha256/e8c73c638ae9ec5ad70c49df7e484040d889cca6b4a9af056579c3d058ea93f0.tar.gz
-r-xr-xr-x  0 0      0         317 Dec 31  1999 blobs/sha256/fcb6f6d2c9986d9cd6a2ea3cc2936e5fc613e09f1af9042329011e43057f3265.tar.gz
-r-xr-xr-x  0 0      0       21202 Dec 31  1999 blobs/sha256/fe5ca62666f04366c8e7f605aa82997d71320183e99962fa76b3209fdfbb8b58.tar.gz
-r-xr-xr-x  0 0      0        1182 Dec 31  1999 manifest.json

@thesayyn
Copy link
Collaborator

This is a known issue. We don't want a direct dependency on rules_pkg. See #217

@gergelyfabian
Copy link
Contributor

Could we just add --mtime='1970-01-01' to the tar command? It should fix this issue.

@gergelyfabian
Copy link
Contributor

Sent PRs #380 (already merged) and #381. Here I simply tried adding parameters for tar to ensure we won't have the time and user/group in the archives.
I guess a better solution would be to start using something like @bazel_tools//tools/build_defs/pkg:build_tar (as rules_pkg does), I believe this is a generally accessible target.

How it could be done:

  • Add build_tar as a parameter for oci_tarball rule (as pkg_tar does)
  • Refactor tarball.sh.tpl to not do the tar-ing by itself, but only prepare the files for tar
  • For that we need to create a temporary dir from the rule implementation (I wasn't sure how to do that properly from Starlark)
  • Set the temporary dir to tarball.sh.tpl
  • After tarball.sh.tpl is resolved and run, run the build_tar tool separately to effectively tar the files together

At least that would be my plan to do this, and I wasn't sure about some technical details here (e.g. how to set up a temp dir where we can extract the blobs and the manifest and then how to use that temp dir to call the build_tar tool).
In general I guess it would be the best to call build_tar for tar-ing.

For now settled on not doing this refactor, but simply fixing the parameters, just leaving here these notes if it would be usable for anyone.

@chancila
Copy link
Author

chancila commented Oct 2, 2023

your PRs have a bunch of gnu specific usages...I don't think this is as easy as just settings some flags, you'll need to do feature detection or something to figure out if you can use the flags...

an easier approach would be to use rules_go and create a simple binary to tar the files...go stdlib has a tar implementation that is pretty straightforward to use...but then we need rules_go, or use go generate multiarch binaries as part of the release process and package them in rules_oci

@alexeagle
Copy link
Collaborator

--mtime is not available in BSD tar so our Mac release is broken now.

We are getting close on bazel-contrib/bazel-lib#468 so that will be the answer, we'll have a hermetic tar program available.

@gergelyfabian
Copy link
Contributor

Can we do anything other than waiting for the hermetic tar implementation? Should I maybe go with feature detection, to re-enable this for non-Mac systems for now?

@alexeagle
Copy link
Collaborator

hermetic tar toolchain is now released in bazel-lib 2.0.0-beta0

@thesayyn
Copy link
Collaborator

thesayyn commented Oct 3, 2023

Can we do anything other than waiting for the hermetic tar implementation? Should I maybe go with feature detection, to re-enable this for non-Mac systems for now?

I believe we should just wait for the hermetic toolchain. That said if you are relying on oci_tarball to be hermetic, that suggests that you are using it outside of its intended use. oci_tarball never intended to be anything other than, to make this image docker loadable. it should never be an input to other targets.

just out of curiosity, what is your use case here?

@chancila
Copy link
Author

chancila commented Oct 4, 2023

at snap we don't push images to registries during a build, users are expected to generate a tar which will get pushed to the right place for you, as such our bazel builds need to generate an image tar

@gergelyfabian
Copy link
Contributor

just out of curiosity, what is your use case here?

We are running integration tests where the image tar is an input. We also have unit tests, but these integration tests are critical to check whether the image configuration, packaging, static file content, etc. are intact. This is also checking integration between our own components.
In any case we depend on the image.tar being reproducible for the integration tests receiving proper cache hits. Another issue is that we needed to remove the image.tars from remote caching as they tend to be big, and instead generate them on CI executors, hence the reproducibility is even more crucial for us.

alexeagle added a commit that referenced this issue Oct 4, 2023
alexeagle added a commit that referenced this issue Oct 4, 2023
alexeagle added a commit that referenced this issue Oct 4, 2023
@gergelyfabian
Copy link
Contributor

For anyone interested, #385 seems to be fixing this issue.

@honnix
Copy link

honnix commented Mar 5, 2024

Not sure whether I should report in this issue, so please let me know if it is worth opening a dedicated one. The tar command used in tarball.sh.tpl somehow generates warnings like tar: Could not open extended attribute file: Operation not permitted on macOS, and providing --no-mac-metadata solves it. If I run the same tar command manually I don't see this warning.

alexeagle added a commit that referenced this issue Apr 25, 2024
Fixes #328

fix: supply mtree file for determinism

fix: set tar content times to beginning of this year

avoids some tools thinking that 1970 is 'too old'

refactor: extract function for mtree lines

refactor: cleanup STAGING_DIR

chore: bump to bazel-lib 2.0rc

chore: remove bazel 5 workaround

Bazel-lib 2.0 doesn't include this anymore

chore: upgrade stardoc to match bzlmod version

ci: test on bazel 7 rather than 5
alexeagle added a commit that referenced this issue Apr 26, 2024
Fixes #328

fix: supply mtree file for determinism

fix: set tar content times to beginning of this year

avoids some tools thinking that 1970 is 'too old'

refactor: extract function for mtree lines

refactor: cleanup STAGING_DIR

chore: bump to bazel-lib 2.0rc

chore: remove bazel 5 workaround

Bazel-lib 2.0 doesn't include this anymore

chore: upgrade stardoc to match bzlmod version

ci: test on bazel 7 rather than 5
@thesayyn
Copy link
Collaborator

thesayyn commented May 7, 2024

fixed by #385

@thesayyn thesayyn closed this as completed May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants