Reorg and refactor code around Blob abuse #1678

chanseokoh · 2019-04-30T22:12:47Z

Continuation of #1647. In #1647, I said the prevalent abuse of Blob (which is not much different from an all-consuming Object in many contexts) is to wrap anything in a Blob only for the sake of writing it to an output stream or compute a digest, which is an unnecessary and convoluted round-trip and making the target invisible underneath Blob.

Another complexity that hurt readability (to me at least) was the similarities between WriterBlob, BlobWriter, and Blob. Blob and WriterBlob look very similar,

public interface Blob {
  BlobDescriptor writeTo(OutputStream outputStream) throws IOException;
}

public interface BlobWriter {
  void writeTo(OutputStream outputStream) throws IOException;
}

but yet, you create a Blob (which is actually WriterBlob) out of BlobWriter

Blob writerBlob = Blobs.from(blobWriter);

and WriterBlob.writeTo() makes use of blobWriter.writeTo(). Before someone becomes familiarized with the code, all this will sound pretty confusing to them. With the Blob abuse I said at the top, this is aggravating.

Some highlighted changes:

Instead of going "something --> Blob.from(something) --> blob.writeTo()" to compute a digest or write to an output stream, I introduced DigestUtil to do it directly without the abuse.
Decoupled Blob from JsonTemplateMapper. Now JSON-related classes sit below Blob in the dependency graph.
Added JsonBlob. This makes sense because Blob sits above JsonTemplate.
Renamed BlobWriter to WritableContents, and moved it to the hash package that sits below Blob.
All Blob.writeTo() methods simply and consistently delegate to DigestUtil.computeDigest(). This resulted in removing BlobDescriptor.fromPipe(), which I think was not named or placed nicely, hose code is also duplicated some Blob.writeTo() method.
BlobDescriptor is merely a holder of (size, digest).

Another thing to note is that, there's no toBlob() or toBlobDescriptor() functions anymore after #1647 and this. All blobs are constructed through Blobs.from().

chanseokoh · 2019-05-01T21:21:40Z

Finally, fixed the MacOS failure. Ready for review.

jib-core/src/main/java/com/google/cloud/tools/jib/hash/DigestUtil.java

loosebazooka · 2019-05-06T18:42:03Z

can you rebase or resolve conflicts on this or whatever? I remember you saying it included changes from the parent PR?

loosebazooka

It looks like WritableContents is only used in Digests which is itself internal, does that mean writable contents is unnecessary? Am I missing something?

Unrelated but I'm still confused: are we also calculating digests for raw strings (not from streams)? It seems our primary mechanism for digest generation is through an inputstream mechanism, but it could be easier for non-streams to use MessageDigest directly?

loosebazooka · 2019-05-06T21:10:09Z

jib-core/src/main/java/com/google/cloud/tools/jib/hash/Digests.java

+   * @throws IOException if reading from or writing fails
+   */
+  public static BlobDescriptor computeDigest(
+      WritableContents contents, OutputStream optionalOutStream) throws IOException {


Just a couple questions about optionalOutStream

is it truly optional? Even calls not using it still pass in a nulloutputstream.

should we use OutputStream instead of OutStream

The name "optional" is there only because of the name "computeDigest" doesn't convey the message that it can additionally copy WritableContents into the out stream while computing the digest. Like, from the point of a user just trying to compute a digest, they won't get why they have to pass an output stream for what.

Maybe I should just rename the method to "computeDigestAndCopy", as I said in the meeting? WDYT?

For outStream, I don't think anyone can confuse that with something else than a stream for output, but if you are averse to that, I can go with outputStream.

It's not so much confusion as much as consistency for outStream vs outputStream, so not a big deal.

I think we can actually deal with the optionalOutStream in a follow up PR.

loosebazooka · 2019-05-06T21:50:23Z

So calculating the digest effectively consumes the stream? Is the purpose of copying the stream to reuse it somewhere else?

chanseokoh · 2019-05-06T22:02:37Z

It looks like WritableContents is only used in Digests which is itself internal, does that mean writable contents is unnecessary? Am I missing something?

Blob actually depends on it. That is, a Blob instance can be created from WritableContents. The instance is WritableContentsBlob actually.

Unrelated but I'm still confused: are we also calculating digests for raw strings (not from streams)? It seems our primary mechanism for digest generation is through an inputstream mechanism, but it could be easier for non-streams to use MessageDigest directly?

Ah, TIL we can use MessageDigest directly for raw strings. Currently, there is one or two places where we calculate a digest from a raw String in our prod code. Another closest one is calculating from a JsonTemplate. For that we may be able to utilize MessageDigest, after converting a JsonTemplate to a String. The current implementation without MessageDigest still works though.

So calculating the digest effectively consumes the stream? Is the purpose of copying the stream to reuse it somewhere else?

Yeah, if data is coming from an input stream, in general we have to consume it, as we have no other choices (unless the input stream is resettable like a ByteArrayInputStream.) Copying the steam is like a by-product, but I admit it is also the other primary task required in Blob.writeTo(), which is effectively Digests.computDigest(in, outStream), i.e., writing the blob contents to outStream while returning its digest. This may further support the idea to rename the method to computeDigestAndCopy or copyAndComputeDigest.

chanseokoh · 2019-05-06T23:13:41Z

Merging. We can follow up in later PRs.

chanseokoh added 2 commits April 30, 2019 18:09

Reorg and refactor

d3bfc7a

Merge branch 'master' into cleanup-2

c047f8c

googlebot added the cla: yes label Apr 30, 2019

chanseokoh added 2 commits May 1, 2019 17:02

Fix wrong filename

f293268

Remove more Blob indirection

e4461b4

chanseokoh mentioned this pull request May 1, 2019

Reorg and refactor repo - continued #1684

Merged

TadCordle reviewed May 3, 2019

View reviewed changes

jib-core/src/main/java/com/google/cloud/tools/jib/hash/DigestUtil.java Outdated Show resolved Hide resolved

chanseokoh added 2 commits May 6, 2019 16:37

Merge branch 'master' into cleanup-2

6147427

DigestUtil --> Digests

005f1d8

chanseokoh force-pushed the cleanup-2 branch from 4735ff5 to 005f1d8 Compare May 6, 2019 21:02

loosebazooka reviewed May 6, 2019

View reviewed changes

loosebazooka approved these changes May 6, 2019

View reviewed changes

chanseokoh merged commit 62ab86b into master May 6, 2019

chanseokoh deleted the cleanup-2 branch May 6, 2019 23:13

elefeint mentioned this pull request Aug 23, 2022

Support base image layer compressed with zstd #3717

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reorg and refactor code around Blob abuse #1678

Reorg and refactor code around Blob abuse #1678

chanseokoh commented Apr 30, 2019

chanseokoh commented May 1, 2019

loosebazooka commented May 6, 2019

loosebazooka left a comment

loosebazooka May 6, 2019

chanseokoh May 6, 2019

loosebazooka May 6, 2019

loosebazooka commented May 6, 2019

chanseokoh commented May 6, 2019

chanseokoh commented May 6, 2019

Reorg and refactor code around Blob abuse #1678

Reorg and refactor code around Blob abuse #1678

Conversation

chanseokoh commented Apr 30, 2019

chanseokoh commented May 1, 2019

loosebazooka commented May 6, 2019

loosebazooka left a comment

Choose a reason for hiding this comment

loosebazooka May 6, 2019

Choose a reason for hiding this comment

chanseokoh May 6, 2019

Choose a reason for hiding this comment

loosebazooka May 6, 2019

Choose a reason for hiding this comment

loosebazooka commented May 6, 2019

chanseokoh commented May 6, 2019

chanseokoh commented May 6, 2019