Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update apache-commons to 1.26.0 and set PAX headers to address build reproducibility issue #4204

Merged
merged 16 commits into from
Mar 11, 2024

Conversation

izogfif
Copy link
Contributor

@izogfif izogfif commented Mar 2, 2024

Fixes #4141 🛠️

@izogfif
Copy link
Contributor Author

izogfif commented Mar 2, 2024

This is a copy of this pull request with an extra commit that fixes the formatting, test, and modification time set by JIB issues.

Comment on lines 84 to 86
// In this particular place we have to use default modification time (1 second past epoch)
// instead the modification time user might've set in JIB settings. That's because we have
// no way to find out the modification time user might've set in JIB settings here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true. Jib intentionally doesn't allow setting the mod time of directories. Let's just remove the comment.

Comment on lines 88 to 124
/**
* Verifies that the modification time has been reset in the PAX headers.
*
* @param entry The archive entry in the layer.
*/
private static void verifyThatModificationTimeIsReset(ArchiveEntry entry) {
assertThat(entry.getLastModifiedDate().toInstant())
.isEqualTo(FileEntriesLayer.DEFAULT_MODIFICATION_TIME);
}

private static FileEntry layerEntry(
Path source, AbsoluteUnixPath destination, FilePermissions permissions, String ownership)
throws IOException {
return new FileEntry(
source,
destination,
FileEntriesLayer.DEFAULT_FILE_PERMISSIONS_PROVIDER.get(source, destination),
FileEntriesLayer.DEFAULT_MODIFICATION_TIME);
permissions,
// Here we make sure to use a default modification time
// since JIB will set modification time for all files to
// the one set in JIB settings by a user. In case user
// hasn't set it, JIB will use default modification time,
// i.e. 1 second past January 1st, 1970 (1 second past epoch).
FileEntriesLayer.DEFAULT_MODIFICATION_TIME,
ownership);
}

private static FileEntry layerEntry(
Path source, AbsoluteUnixPath destination, FilePermissions permissions) throws IOException {
return layerEntry(source, destination, permissions, "");
}

private static FileEntry layerEntry(Path source, AbsoluteUnixPath destination)
throws IOException {
return layerEntry(
source,
destination,
FileEntriesLayer.DEFAULT_FILE_PERMISSIONS_PROVIDER.get(source, destination));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just revert this file back to its original state. The original author misunderstood this part. The PR shouldn't touch this file, and the existing tests will verify the correctness of your changes.

Copy link
Member

@chanseokoh chanseokoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, upgrade the apache-commons-compress (#3964) to verify if it's working correctly.

@izogfif
Copy link
Contributor Author

izogfif commented Mar 3, 2024

@chanseokoh done. I had to update TarExtractorTest since it was failing now. Probably, newer version of apache-commons-compress (I used 1.26.0 instead of 1.23.0 from #3964) extracts modification timetamp with millisecond precision.

Copy link
Member

@chanseokoh chanseokoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally you should update jib-core/CHANGELOG, jib-maven-plugin/CHANGELOG and jib-gradle-plugin/CHANGELOG about what is fixed.

BTW, I'm not the maintainer of this repo.

entry.setModTime(modTime.toEpochMilli());

String headerTime = Long.toString(modTime.getEpochSecond());
final long nanos = modTime.getNano();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with the PAX headers, so just asking: does it have to be the nanosecond precision? Like, what if someone decides to go with milli or microseconds? Is it possible? Not that I care as long as nano works.

Copy link
Contributor Author

@izogfif izogfif Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a post I was able to find:

This is important for example with the tarfile module with the pax tar format. The POSIX tar standard mandates storing the mtime in the extended header (if it is not an integer) with as much precision as is available in the underlying file system, and likewise to restore this time properly upon extraction.

So I'd say that it's better to store time with nanosecond precision. File systems that don't support this precision won't fail upon extraction: they'll simply truncate values.

@izogfif
Copy link
Contributor Author

izogfif commented Mar 4, 2024

CHANGELOG files were updated.

@izogfif
Copy link
Contributor Author

izogfif commented Mar 5, 2024

Rebased on master.

@izogfif
Copy link
Contributor Author

izogfif commented Mar 6, 2024

Rebased on master (2).

Comment on lines 97 to 100
// We have to use millisecond precision here. Taken from Instant.parse.
final DateTimeFormatter parser =
new DateTimeFormatterBuilder().parseCaseInsensitive().appendInstant(3).toFormatter();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@izogfif Is there is a reason we're testing with millisecond precision here? Curious since we opted for nanosecond precision above (ReproducibleLayerBuilder)

Copy link
Contributor Author

@izogfif izogfif Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mpeddada1 For some reason, the code below can't extract micro- and nanoseconds, it's limited to milliseconds (at least on Windows machine with NTFS).
In one of the methods below I had to add centiseconds and avoided having to use the same DateTimeFormatter since, conveniently, truncation to milliseconds yielded "0" as the third digit, and Instant.parse can parse centiseconds.

@mpeddada1
Copy link
Contributor

Pasting stacktrace of failing checks below:

kokoro-macos:

Successfully started process 'Gradle Test Executor 4'
Picked up JAVA_TOOL_OPTIONS: -Djava.net.preferIPv6Addresses=true

com.google.cloud.tools.jib.registry.DockerRegistryBearerTokenTest > testDecode_invalidToken_actionsArray SKIPPED

com.google.cloud.tools.jib.tar.TarExtractorTest > testExtract_modificationTimePreserved FAILED
    value of: getLastModifiedTime(...)
    expected: 2019-08-01T16:13:09.72Z
    but was : 2019-08-01T16:13:09Z
        at com.google.cloud.tools.jib.tar.TarExtractorTest.testExtract_modificationTimePreserved(TarExtractorTest.java:102)

com.google.cloud.tools.jib.tar.TarExtractorTest > testExtract_reproducibleTimestampsEnabled FAILED
    value of: getLastModifiedTime(...)
    expected: 2021-01-29T21:10:02.78Z
    but was : 2021-01-29T21:10:02Z
        at com.google.cloud.tools.jib.tar.TarExtractorTest.testExtract_reproducibleTimestampsEnabled(TarExtractorTest.java:129)

Gradle Test Executor 4 finished executing tests.

> Task :jib-core:test FAILED

kokoro-ubuntu

com.google.cloud.tools.jib.api.ContainerizerIntegrationTest STANDARD_ERROR
    SLF4J: No SLF4J providers were found.
    SLF4J: Defaulting to no-operation (NOP) logger implementation
    SLF4J: See [https://www.slf4j.org/codes.html#noProviders](https://www.google.com/url?q=https://www.slf4j.org/codes.html%23noProviders&sa=D) for further details.

com.google.cloud.tools.jib.api.ReproducibleImageTest > testConfiguration FAILED
    value of: extractFromTarFileAsString(...)
    expected: ?"diff_ids":["sha256:18e4f44e6d1835bd968339b166057bd17ab7d4cbb56dc7262a5cafea7cf8d405","sha256:13369c34f073f2b9c1fa6431e23d925f1a8eac65b1726c8cc8fcc2596c69b414","sha256:4f92c507112d7880ca0f504ef8272b7fdee107263270125036a260a741565923"]}}
    but was : ?"diff_ids":["sha256:2fcc2157bf42c89195676ef6e973a96d7b018c9d30ba89db95e9e0722e1c8ef3","sha256:21f521f3217067d277af37512a08c72281d90fdd02d7174db632c8c3a34403bd","sha256:6beba018395265af5061864b7f4678e831eb2daebb1045487c641fc8b142e319"]}}
        at com.google.cloud.tools.jib.api.ReproducibleImageTest.testConfiguration(ReproducibleImageTest.java:130)

com.google.cloud.tools.jib.api.ReproducibleImageTest > testTarballStructure FAILED
    missing (3)
    #1      : c46572ef74f58d95e44dd36c1fbdfebd3752e8b56a794a13c11cfed35a1a6e1c.tar.gz
    #2      : 6d2763b0f3940d324ea6b55386429e5b173899608abf7d1bff62e25dd2e4dcea.tar.gz
    #3      : 530c1954a2b087d0b989895ea56435c9dc739a973f2d2b6cb9bb98e55bbea7ac.tar.gz

    unexpected (3)
    #1      : 98682a867906d9d07cf3c51a4fb9e08e9d5baddd1ca5dc7834f58f434c9cb15c.tar.gz
    #2      : 527db49d4e0c4159346119b4971d59016bfedceed874abab2b510ce433f6b15c.tar.gz
    #3      : 16d03883198935b4119896dcea0ea14e1bf105b6ac0a35a88820d08bc0263306.tar.gz
    ---
    expected: [c46572ef74f58d95e44dd36c1fbdfebd3752e8b56a794a13c11cfed35a1a6e1c.tar.gz, 6d2763b0f3940d324ea6b55386429e5b173899608abf7d1bff62e25dd2e4dcea.tar.gz, 530c1954a2b087d0b989895ea56435c9dc739a973f2d2b6cb9bb98e55bbea7ac.tar.gz, config.json, manifest.json]
    but was : [98682a867906d9d07cf3c51a4fb9e08e9d5baddd1ca5dc7834f58f434c9cb15c.tar.gz, 527db49d4e0c4159346119b4971d59016bfedceed874abab2b510ce433f6b15c.tar.gz, 16d03883198935b4119896dcea0ea14e1bf105b6ac0a35a88820d08bc0263306.tar.gz, config.json, manifest.json]
        at com.google.cloud.tools.jib.api.ReproducibleImageTest.testTarballStructure(ReproducibleImageTest.java:104)

com.google.cloud.tools.jib.api.ReproducibleImageTest > testManifest FAILED
    value of: extractFromTarFileAsString(...)
    expected: [{"Config":"config.json","RepoTags":["jib-core/reproducible:latest"],"Layers":["c46572ef74f58d95e44dd36c1fbdfebd3752e8b56a794a13c11cfed35a1a6e1c.tar.gz","6d2763b0f3940d324ea6b55386429e5b173899608abf7d1bff62e25dd2e4dcea.tar.gz","530c1954a2b087d0b989895ea56435c9dc739a973f2d2b6cb9bb98e55bbea7ac.tar.gz"]}]
    but was : [{"Config":"config.json","RepoTags":["jib-core/reproducible:latest"],"Layers":["98682a867906d9d07cf3c51a4fb9e08e9d5baddd1ca5dc7834f58f434c9cb15c.tar.gz","527db49d4e0c4159346119b4971d59016bfedceed874abab2b510ce433f6b15c.tar.gz","16d03883198935b4119896dcea0ea14e1bf105b6ac0a35a88820d08bc0263306.tar.gz"]}]
        at com.google.cloud.tools.jib.api.ReproducibleImageTest.testManifest(ReproducibleImageTest.java:119)

Gradle Test Executor 7 finished executing tests.

> Task :jib-core:integrationTest FAILED

@izogfif
Copy link
Contributor Author

izogfif commented Mar 6, 2024

@mpeddada1

Pasting stacktrace of failing checks below:

kokoro-macos:

com.google.cloud.tools.jib.tar.TarExtractorTest > testExtract_modificationTimePreserved FAILED
    value of: getLastModifiedTime(...)
    expected: 2019-08-01T16:13:09.72Z
    but was : 2019-08-01T16:13:09Z
        at com.google.cloud.tools.jib.tar.TarExtractorTest.testExtract_modificationTimePreserved(TarExtractorTest.java:102)

How can I check that current OS is macOS? It seems that storing files with millisecond precision on macOS is not supported. It seems that this is what's happening:

  • TarExtractor.extract extracts a .tar file from resources into a temporary folder on test machine.
  • Underlying file system truncates file modifications timestamps (they're stored with nanosecond precision inside tar file). On Windows and Ubuntu, truncation is done to millisecond precision. On macOS, apparently, to seconds.
  • Tests pass on Ubuntu and Windows. They fail on macOS.

Possible solutions:

  • Make macOS provide file modification timestamps with millisecond precision (like Windows and Ubuntu do).
  • Read file modification timestamps with any precision, then truncate them to seconds, and then check these truncated values in tests.
  • Third way.

kokoro-ubuntu

About these failures: if I update values ("expected") with new ones ("but was"), it should fix Ubuntu build. But won't this fail Windows build?

@izogfif
Copy link
Contributor Author

izogfif commented Mar 7, 2024

@mpeddada1 I finally found the contributor guide and was able to run tests on Ubuntu via:

export JIB_INTEGRATION_TESTING_LOCATION=localhost:9990
./gradlew clean goJF build integrationTest

but then testWar_jetty test fails with:

java.lang.AssertionError: expected:<Hello world> but was:<null>

Apparently, I'm unable to make Jetty container start in Docker. Please re-run those workflows / kokoro-xxx tests on your end.

@chanseokoh
Copy link
Member

I also noticed the Jetty failure locally. Don't know why. It is the test in jib-cli. Since you are not changing the jib-cli code, for now, you can do the following which will cover your changes:

./gradlew jib-core:integrationTest jib-maven-plugin:integrationTest jib-gradle-plugin:integrationTest

@mpeddada1
Copy link
Contributor

  • TarExtractor.extract extracts a .tar file from resources into a temporary folder on test machine.
  • Underlying file system truncates file modifications timestamps (they're stored with nanosecond precision inside tar file). On Windows and Ubuntu, truncation is done to millisecond precision. On macOS, apparently, to seconds.
  • Tests pass on Ubuntu and Windows. They fail on macOS.

Thanks for looking into this! It is interesting that macos only goes to seconds precision. It is possible that the job still uses the old HFS+ format drive. That being said, I think verifying seconds precision should be sufficient.

Re kokoro-ubuntu: I would assume that the digest value change is due to the new PAX header logic we added so i would expect it to match in both Ubuntu and windows. But I could also be missing something here, why do we think the windows tests would fail?

@izogfif
Copy link
Contributor Author

izogfif commented Mar 7, 2024

Re kokoro-ubuntu: I would assume that the digest value change is due to the new PAX header logic we added so i would expect it to match in both Ubuntu and windows. But I could also be missing something here, why do we think the windows tests would fail?

Yes, I think that PAX header changed checksums. I was expecting tests to fail on all OS (Windows, macOS, Ubuntu), but, apparently, before I fixed checksum checks, kokoro-windows tests were successful. No idea why. The tests have passed this time, too.

@mpeddada1
Copy link
Contributor

Yes, I think that PAX header changed checksums. I was expecting tests to fail on all OS (Windows, macOS, Ubuntu), but, apparently, before I fixed checksum checks, kokoro-windows tests were successful. No idea why. The tests have passed this time, too.

Ah our coverage on Windows is fairly limited at the moment as we run integration tests only in ubuntu and MacOS. That's probably why we weren't able to see the failure in windows.

@izogfif
Copy link
Contributor Author

izogfif commented Mar 9, 2024

So, about this pull request, what are the next steps? Is there something I need to do in order for it to be accepted (and a new build of JIB to be released)?

@mpeddada1
Copy link
Contributor

No further action need, I'll go ahead and merge the PR. Thanks for your contribution @izogfif!

@mpeddada1 mpeddada1 changed the title Make image builds reproducible (2) Update apache-commons to 1.26.0 and set PAX headers to address build reproducibility issue Mar 11, 2024
@mpeddada1 mpeddada1 merged commit 05e93d0 into GoogleContainerTools:master Mar 11, 2024
8 checks passed
@glasser
Copy link

glasser commented Mar 18, 2024

Thanks for everyone who worked on this! Looking forward to seeing this get into a release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Builds not reproducible
6 participants