Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem achieving reproducibility #4131

Closed
scirner22 opened this issue Nov 1, 2023 · 25 comments
Closed

problem achieving reproducibility #4131

scirner22 opened this issue Nov 1, 2023 · 25 comments

Comments

@scirner22
Copy link

scirner22 commented Nov 1, 2023

Environment: local docker build

  • Jib version: 3.4.0
  • Build tool: gradle 8.3
  • OS: mac m2

Description of the issue:

I'm trying to produce reproducible builds within gradle projects that include internal dependencies so I'm not able to share the project.

I'm using jib on a gradle project that contains only a single application so there's no dependencies pulled in with project references. I can include the build block below. Nothing is placed in src/main/jib.

// Create JARs in a reproducible build fashion.
tasks.withType<AbstractArchiveTask>().configureEach {
    isPreserveFileTimestamps = false
    isReproducibleFileOrder = true
}

With the above, I'm able to clean build and verify that the jar is reproducible (jar checksum is the same).

Although, when I clean build jibDockerBuild the build/jib-image.digest contents are not reproducible.

From what I can tell jib does not place any files besides under /app so I did the following and was able to confirm that no files there differ between clean build jibDockerBuild calls. find app/. -type f -exec sha1sum {} + was used within separate image builds and I can diff those files to verify there's no diff. I also spot checked that the date on all of those files is the same (epoch time).

I can also run this find app/. -type f -newermt +3 -print and verify that the old files that show up are /proc/., /sys/., and /.dockerenv. This tells me that the jib build isn't building any files and placing them on the image with the current time.

I'm at a loss at what else to check for to see what differs between jib builds that's keeping us from having reproducibility. Is there any suggestions at all that might help with my debugging effort?

Expected behavior:

Reproducible digest

Steps to reproduce:

Unfortunately, I'm not able to reproduce with a hello world project and our build environment contains internal dependencies.

jib-gradle-plugin Configuration:

In an effort to debug this problem, I've been using the below configuration so it's easy to docker run and get dropped into a shell.

jib {
    from {
        image = "bash:alpine3.18"
    }

    container {
        setEntrypoint("INHERIT")
    }
}

Here's an example of the layers produced across builds. The first four layers are always reproducible.

        "RootFS": {
            "Type": "layers",
            "Layers": [
                "sha256:cc2447e1835a40530975ab80bb1f872fbab0f2a0faecf2ab16fbbb89b3589438",
                "sha256:623c87dfb411ee966cdb233833033a338e48cd6eaef3466b16f8f700a63ad564",
                "sha256:37d625053e3d7cf997a808f8b4bbf64e218a726fc0a4ee11526a81b32e544f54",
                "sha256:78c166a57f0d187d5e6530e653e9dd5550d0a34d05b841d8d97fb9d94223859a",
                "sha256:ac519d1616320e98bee34efe63bec0d921ff4a7e5217ad68b25c416ed54e3914",
                "sha256:7cb6b88dcd323cf433ef8e0682bba6135052dc7b7234cda26b59f5fc57386eb5",
                "sha256:5cdee48ac2bc11383889fe79a98b97dd45e2c2f728ff33b562c7ca6a9c7a9a1d"
            ]
        },

        "RootFS": {
            "Type": "layers",
            "Layers": [
                "sha256:cc2447e1835a40530975ab80bb1f872fbab0f2a0faecf2ab16fbbb89b3589438",
                "sha256:623c87dfb411ee966cdb233833033a338e48cd6eaef3466b16f8f700a63ad564",
                "sha256:37d625053e3d7cf997a808f8b4bbf64e218a726fc0a4ee11526a81b32e544f54",
                "sha256:78c166a57f0d187d5e6530e653e9dd5550d0a34d05b841d8d97fb9d94223859a",
                "sha256:c0d051434e6a3ba9863de261ad7d8e08ff8ee6086e66d62af20219d2572f91bf",
                "sha256:5f396e1c50fb0d234e9b230fd9504e3db74b53a5d1fb77a6ce5bec97fa7c8d8e",
                "sha256:b3b9c146e328ef8b7d0be37c7f0e751ec909713bb4ff786316ce9c2acbaa30f8"
            ]
        },
@chanseokoh
Copy link
Member

chanseokoh commented Nov 1, 2023

The first four layers are always reproducible.

This proves that there are cases where the last three layers can be different. A layer is actually a .tar file (could have been gzipped as .tar.gz, but that's just a matter of representation), so you can diff those files directly. It doesn't even need the Docker runtime to run the image. Just save the image using docker save, unzip it, and look into the file contents.

Note it's also possible that all the file contents inside a layer .tar file are the same (including timestamps and various file attributes) but the original gzipped .tar.gz files are different. I've seen some rare cases where a new gzip library version generates different bytes.

@scirner22
Copy link
Author

I appreciate your suggestion that was helpful to drill down on where the differences start!

Note it's also possible that all the file contents inside a layer .tar file are the same (including timestamps and various file attributes) but the original gzipped .tar.gz files are different. I've seen some rare cases where a new gzip library version generates different bytes.

It seems like this could be the result of dependency upgrades to the gzip library that jib is using? In this case, it seems like it can be ruled out since I can build and immediately rebuild, any number of times, and never have jib reproducibility.

The first layer that's different is the project resources.

Build A

..5b02c41f03c8db9b5a [main|✚ 2…5⚑1] ❯❯❯ shasum -a256 layer.tar
b51c870edd91cf29be12ad186b1348af0d58bcff734300606c1c731bc6a8210f  layer.tar
..5b02c41f03c8db9b5a [main|✚ 2…5⚑1] ❯❯❯ tar -cvf test.tar app
a app
a app/resources
a app/resources/logback.xml
a app/resources/application-development.properties
a app/resources/application-container.properties
a app/resources/banner.txt
a app/resources/db
a app/resources/application.properties
a app/resources/db/migration
a app/resources/db/migration/V6__Add_change_log_table.sql
... removed some files for brevity
..5b02c41f03c8db9b5a [main|✚ 2…5⚑1] ❯❯❯ shasum -a256 test.tar
e0a25b6ab6eb82690bbb539d0bca291a8b4114ea317a09798e33da994a6351fb  test.tar

Build B

..9ba9e3891eef3de9ae [main|✚ 2…5⚑1] ❯❯❯ shasum -a256 layer.tar
d5d166c8cb69240b8f0088d8db4be1b7956ebd0f4f6df7cb3d257aec7c933694  layer.tar
..9ba9e3891eef3de9ae [main|✚ 2…5⚑1] ❯❯❯ tar -cvf test.tar app
...removed all files for brevity
..9ba9e3891eef3de9ae [main|✚ 2…5⚑1] ❯❯❯ shasum -a256 test.tar
e0a25b6ab6eb82690bbb539d0bca291a8b4114ea317a09798e33da994a6351fb  test.tar

@chanseokoh
Copy link
Member

chanseokoh commented Nov 1, 2023

It seems like this could be the result of dependency upgrades to the gzip library that jib is using? In this case, it seems like it can be ruled out since I can build and immediately rebuild, any number of times, and never have jib reproducibility.

Yes. So since you're using Jib on the same machine, it's very likely that you do put different files in layers every time. And you seem to say that those project resources are different somehow.

And I forgot to say this: Jib doesn't put a jar, so your jar task configuration to produce a reproducible jar is irrelevant.

@scirner22
Copy link
Author

It seems like this could be the result of dependency upgrades to the gzip library that jib is using? In this case, it seems like it can be ruled out since I can build and immediately rebuild, any number of times, and never have jib reproducibility.

Yes. So since you're using Jib on the same machine, it's very likely that you do put different files in layers every time. And you seem to say that those project resources are different somehow.

The resources directory across the two builds should be identical though. I think I proved that by showing the following steps above:

  • untar the resources layer for build A and B
  • tar the app directory myself
  • checksum the produced test.tar and they match

@chanseokoh
Copy link
Member

chanseokoh commented Nov 1, 2023

The resources directory across the two builds should be identical though.

You only prove that the contents of the files are identical. A file can have different sorts of attributes. I'd first verify each individual file is byte-to-byte identical (diff a b won't work, as it won't check file timestamps for example). If all of them are identical, then I think for some reason the tar files are generated with different metadata or something.

And make sure you didn't miss any hidden or special files when tarring or untarring.

@scirner22
Copy link
Author

Sounds good. Might be worth mentioning - I removed the resources for testing purposes so there's no produced resources layer and now the class file layer isn't reproducible so I don't think it's related to any contents of the resources directory.

@scirner22
Copy link
Author

It seems like I'm getting timing mismatches in the layer metadata.

To keep the diff as small as possible I'm using this resources structure

..933793c3090ba5ed3e [main|✚ 2…9⚑1] ❯❯❯ find app
app
app/resources
app/resources/banner.txt
..933793c3090ba5ed3e [main|✚ 2…9⚑1] ❯❯❯ cat app/resources/banner.txt
this is a single line!

Here's the diff produced by xxd of the mismatching layers

226,228c226,228
< 00000e10: 3734 392e 3234 3232 3938 340a 3238 2063  749.2422984.28 c
< 00000e20: 7469 6d65 3d31 3639 3838 3631 3734 382e  time=1698861748.
< 00000e30: 3830 3632 3033 320a 3338 204c 4942 4152  8062032.38 LIBAR
---
> 00000e10: 3831 302e 3337 3034 3539 380a 3238 2063  810.3704598.28 c
> 00000e20: 7469 6d65 3d31 3639 3838 3631 3830 392e  time=1698861809.
> 00000e30: 3632 3530 3634 300a 3338 204c 4942 4152  6250640.38 LIBAR
230c230
< 00000e50: 6d65 3d31 3639 3838 3631 3734 380a 0000  me=1698861748...
---
> 00000e50: 6d65 3d31 3639 3838 3631 3830 390a 0000  me=1698861809...

@chanseokoh
Copy link
Member

chanseokoh commented Nov 1, 2023

That makes sense and is what I suspected. This article explains the ctime file attribute.

The ctime refers to the last time when a file’s metadata, such as its ownership, location, file type and permission settings, was changed.

This is where mistakes may occur. To be clear, ctime and mtime are totally different, since a file’s content and its metadata are different.

The metadata of a file is like its “DNA”. Two files may have exactly the same content, but they are two different files as long as their metadata are not the same.

I think the observed ctime is either the ctime of a file in the layer tar or the ctime of the layer tar itself. Which is it?

@scirner22
Copy link
Author

Sorry, I might be conflating settings here, but I'm not touching these resources at all between reproducible build attempts, but even if I was I thought the jib time settings handle that transparently? I see jib is setting the single resource file to the default of epoch in the layer on all attempts.

..d127ac0735838a0134 [main|✚ 2…9⚑1] ❯❯❯ ls -al app/resources/banner.txt
-rw-r--r--  1 stevecirner  staff  23 Dec 31  1969 app/resources/banner.txt
..d127ac0735838a0134 [main|✚ 2…9⚑1] ❯❯❯ find app/.
app/.
app/./resources
app/./resources/banner.txt
..d127ac0735838a0134 [main|✚ 2…9⚑1] ❯❯❯ ls -al app/resources/banner.txt
-rw-r--r--  1 stevecirner  staff  23 Dec 31  1969 app/resources/banner.txt

I don't know much about the format of tar metadata to say what that time is, but 1698861809 is 3 hours ago which is the time I built this image. Since the sole file in the layer is the epoch default that jib set, it leads me to believe that 1698861809 time is associated with the time the tar file itself was built?

@scirner22
Copy link
Author

I'm not setting creationTime or filesModificationTime so the default jib values of epoch and epoch plus one are being used.

@chanseokoh
Copy link
Member

chanseokoh commented Nov 1, 2023

mtime, creationTime or filesModificationTime is irrelevant. We are talking about ctime, and it is that your OS/kernel automatically touches ctime when the metadata of files are changed. We first need to see if some files have the matching timestamp.

ls -al returns mtime. Can you also do ls -alu and ls -alc for all the files and directories in the layer tar as well as the layer tar itself?

And after getting all the timestamps, can you correlate the observed value with any of them?

@chanseokoh
Copy link
Member

chanseokoh commented Nov 1, 2023

Come to think of it, checking ctime this way won't work, because ctime will be reset anyway when unpacking files. What I want to check first is whether the observed ctime is the ctime of the layer tar or the ctime of the files in the tar at the time of creating the tar. For that matter, I think you can test this out by putting multiple files, not one, and do the byte-diff to see if there are multiple entries of diff or a single one.

@scirner22
Copy link
Author

I created two more resources so now there's three total and each one just contains a random sentence of text. The layer byte diff now includes three pairs of differences, one for each file it seems.

226,228c226,228
< 00000e10: 3833 322e 3033 3031 3835 370a 3238 2063  832.0301857.28 c
< 00000e20: 7469 6d65 3d31 3639 3839 3431 3833 312e  time=1698941831.
< 00000e30: 3531 3637 3331 320a 3338 204c 4942 4152  5167312.38 LIBAR
---
> 00000e10: 3836 342e 3439 3036 3235 360a 3238 2063  864.4906256.28 c
> 00000e20: 7469 6d65 3d31 3639 3839 3431 3836 332e  time=1698941863.
> 00000e30: 3737 3831 3531 320a 3338 204c 4942 4152  7781512.38 LIBAR
230c230
< 00000e50: 6d65 3d31 3639 3839 3431 3833 310a 0000  me=1698941831...
---
> 00000e50: 6d65 3d31 3639 3839 3431 3836 330a 0000  me=1698941863...


354,356c354,356
< 00001610: 3833 322e 3031 3036 3531 390a 3238 2063  832.0106519.28 c
< 00001620: 7469 6d65 3d31 3639 3839 3431 3833 312e  time=1698941831.
< 00001630: 3531 3632 3031 350a 3338 204c 4942 4152  5162015.38 LIBAR
---
> 00001610: 3836 342e 3539 3230 3536 360a 3238 2063  864.5920566.28 c
> 00001620: 7469 6d65 3d31 3639 3839 3431 3836 332e  time=1698941863.
> 00001630: 3737 3736 3939 330a 3338 204c 4942 4152  7776993.38 LIBAR
358c358
< 00001650: 6d65 3d31 3639 3839 3431 3833 310a 0000  me=1698941831...
---
> 00001650: 6d65 3d31 3639 3839 3431 3836 330a 0000  me=1698941863...


482,484c482,484
< 00001e10: 3833 322e 3033 3037 3439 370a 3238 2063  832.0307497.28 c
< 00001e20: 7469 6d65 3d31 3639 3839 3431 3833 312e  time=1698941831.
< 00001e30: 3531 3538 3737 340a 3338 204c 4942 4152  5158774.38 LIBAR
---
> 00001e10: 3836 342e 3530 3338 3436 370a 3238 2063  864.5038467.28 c
> 00001e20: 7469 6d65 3d31 3639 3839 3431 3836 332e  time=1698941863.
> 00001e30: 3737 3733 3135 350a 3338 204c 4942 4152  7773155.38 LIBAR
486c486
< 00001e50: 6d65 3d31 3639 3839 3431 3833 310a 0000  me=1698941831...
---
> 00001e50: 6d65 3d31 3639 3839 3431 3836 330a 0000  me=1698941863...

One thing to note is I don't see any changes on my host machine. I created the two additional files and modified the first at 12:11, waited a bit, and ran clean build jibDockerBuild twice.

..ces [ci-combine-jib-tasks|✚ 20…3] ❯❯❯ ls -l service-grouper/service/src/main/resources
total 24
-rw-r--r--  1 stevecirner  staff  23 Nov  2 12:11 banner.txt
-rw-r--r--  1 stevecirner  staff  30 Nov  2 12:11 banner2.txt
-rw-r--r--  1 stevecirner  staff  32 Nov  2 12:11 banner3.txt
..ces [ci-combine-jib-tasks|✚ 20…3] ❯❯❯ ls -lc service-grouper/service/src/main/resources
total 24
-rw-r--r--  1 stevecirner  staff  23 Nov  2 12:11 banner.txt
-rw-r--r--  1 stevecirner  staff  30 Nov  2 12:11 banner2.txt
-rw-r--r--  1 stevecirner  staff  32 Nov  2 12:11 banner3.txt
..ces [ci-combine-jib-tasks|✚ 20…3] ❯❯❯ ls -lu service-grouper/service/src/main/resources
total 24
-rw-r--r--  1 stevecirner  staff  23 Nov  2 12:11 banner.txt
-rw-r--r--  1 stevecirner  staff  30 Nov  2 12:11 banner2.txt
-rw-r--r--  1 stevecirner  staff  32 Nov  2 12:11 banner3.txt

Epoch to timestamp of tar times:
1698941831 12:17:11
1698941863 12:17:43

So the times present in the tar bytes coincide with the times that the gradle tasks ran.

@chanseokoh
Copy link
Member

chanseokoh commented Nov 2, 2023

This is what I expected. I thought it's very likely that the Apache Commons Compress is including ctime of each of the files being tarred in each tar entry. (Each tar entry basically corresponds to a file or a directory.) Including ctime is not unheard of in the tar world, but what I thought weird if this was the case is that storing ctime in an archive would be pretty much useless, because whenever someone unpacks files from a tar, the ctime will just be the current time set by the kernel anyway. I think there's no practical way to make use of the store timestamps.

Anyways, it is what it is. You said earlier that you cannot reproduce this with a hello-world project, and given these observations, what I hypothesize is that your project pulls in a different version of Apache Commons Compress. Is your project a multi-module project? If not, can you force-set the same library version that the hello-world project uses?

@chanseokoh
Copy link
Member

chanseokoh commented Nov 2, 2023

Is your project a multi-module project? If not, can you set force set the same library version that the hello-world project uses?

For that matter, you can check the version used in the hello-world project by running ./gradlew buildEnv on it.

$ ./gradlew buildEnv
...
------------------------------------------------------------
Root project
------------------------------------------------------------
...
          |    \--- com.google.guava:guava:31.0.1-android -> 31.1-jre (*)
          +--- org.apache.commons:commons-compress:1.21
          +--- com.google.guava:guava:31.1-jre (*)
...

Note there are cases where you cannot trust the buildEnv output if it is a multi-module project. If your project is not a multi-module one, it'd be interesting to check the output of buildEnv too.

Then, assuming that your project is not a multi-module one, you can force a version like this: #3564 (comment)

@chanseokoh
Copy link
Member

chanseokoh commented Nov 2, 2023

FYI, there is a precedence that a new version of Apache Commons Compress produced a different binary:

I forgot about diffoscope I used back then. You may try this too, although I'm not sure if it will catch ctime.

@scirner22
Copy link
Author

This is a multi module project. The root level resolves to 1.23.0 and the module that's using jib tasks resolves to 1.21. I'm actually not certain how gradle will choose which version to use when the tasks are actually executed.

The linked apache issue is interesting. I wonder if something similar is happening again on 1.22 or 1.23.

@chanseokoh
Copy link
Member

chanseokoh commented Nov 2, 2023

This is a multi module project. The root level resolves to 1.23.0 and the module that's using jib tasks resolves to 1.21.

As I said, often you cannot trust this output in a multi-module project, so it's very possible that the Jib module uses 1.23.0. You should carefully follow the project setup explained in this FAQ to ensure that Jib uses 1.21. That is, you define all plugins in the root project while selectively applying them. And then force the version 1.21.

That said, I wonder what happens if you force 1.23.0 in the hello-world project.

@scirner22
Copy link
Author

Thank you for all of your assistance! We have reproducibility again when forcing 1.21 so it does seem like 1.22 or 1.23 changes introduced some more breaking changes from a reproducibility perspective.

I have not looked at the jib or apache compress lib, but I'll attempt to take a closer look and see if I can get a change together.

@chanseokoh
Copy link
Member

chanseokoh commented Nov 2, 2023

ReproducibleLayerBuilder is where Jib scrubs non-reproducible properties like timestamps, users, etc. See here and here. I can imagine there may be a method of Apache Commons Compress to set ctime. I think it's also possible that such a method doesn't exist in 1.21 but new versions.

@GoogleContainerTools/cloud-java-team-teamsync this is a bigger issue to Jib, because eventually, Jib will have to upgrade the library.

@scirner22
Copy link
Author

scirner22 commented Nov 3, 2023

@chanseokoh You're correct. It seems like they added more time fields in 1.22.

andrebrait/commons-compress@b7f0cbb

Showing surrounding lines on my byte diff helped clear things a bit.

@@ -481,7 +481,7 @@
 00001e00: 3238 2061 7469 6d65 3d31 3639 3839 3431  28 atime=1698941
-00001e10: 3833 322e 3033 3037 3439 370a 3238 2063  832.0307497.28 c
-00001e20: 7469 6d65 3d31 3639 3839 3431 3833 312e  time=1698941831.
-00001e30: 3531 3538 3737 340a 3338 204c 4942 4152  5158774.38 LIBAR
+00001e10: 3836 342e 3530 3338 3436 370a 3238 2063  864.5038467.28 c
+00001e20: 7469 6d65 3d31 3639 3839 3431 3836 332e  time=1698941863.
+00001e30: 3737 3733 3135 350a 3338 204c 4942 4152  7773155.38 LIBAR
 00001e40: 4348 4956 452e 6372 6561 7469 6f6e 7469  CHIVE.creationti
-00001e50: 6d65 3d31 3639 3839 3431 3833 310a 0000  me=1698941831...
+00001e50: 6d65 3d31 3639 3839 3431 3836 330a 0000  me=1698941863...
 00001e60: 0000 0000 0000 0000 0000 0000 0000 0000  ................

In addition to file related time fields, it seems like a LIBARCHIVE.creationtime header was added.

@chanseokoh
Copy link
Member

chanseokoh commented Nov 3, 2023

In addition to file related time fields, it seems like a LIBARCHIVE.creationtime header was added.

Good catch! This needs to be addressed as well.

@chanseokoh
Copy link
Member

Another user hit this (#4141), and their PR looks very promising.

@mpeddada1
Copy link
Contributor

PR #4142 which will hopefully address this issue is currently under review.

@mpeddada1 mpeddada1 self-assigned this Feb 21, 2024
@mpeddada1
Copy link
Contributor

jib-gradle-plugin:3.4.2 and jib-maven-plugin:3.4.2 have been released with the fix (#4204)! Marking this issue as complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants