Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorg and refactor code around Blob abuse #1678

Merged
merged 6 commits into from
May 6, 2019
Merged

Reorg and refactor code around Blob abuse #1678

merged 6 commits into from
May 6, 2019

Conversation

chanseokoh
Copy link
Member

Continuation of #1647. In #1647, I said the prevalent abuse of Blob (which is not much different from an all-consuming Object in many contexts) is to wrap anything in a Blob only for the sake of writing it to an output stream or compute a digest, which is an unnecessary and convoluted round-trip and making the target invisible underneath Blob.

Another complexity that hurt readability (to me at least) was the similarities between WriterBlob, BlobWriter, and Blob. Blob and WriterBlob look very similar,

public interface Blob {
  BlobDescriptor writeTo(OutputStream outputStream) throws IOException;
}

public interface BlobWriter {
  void writeTo(OutputStream outputStream) throws IOException;
}

but yet, you create a Blob (which is actually WriterBlob) out of BlobWriter

Blob writerBlob = Blobs.from(blobWriter);

and WriterBlob.writeTo() makes use of blobWriter.writeTo(). Before someone becomes familiarized with the code, all this will sound pretty confusing to them. With the Blob abuse I said at the top, this is aggravating.

Some highlighted changes:

  • Instead of going "something --> Blob.from(something) --> blob.writeTo()" to compute a digest or write to an output stream, I introduced DigestUtil to do it directly without the abuse.
  • Decoupled Blob from JsonTemplateMapper. Now JSON-related classes sit below Blob in the dependency graph.
  • Added JsonBlob. This makes sense because Blob sits above JsonTemplate.
  • Renamed BlobWriter to WritableContents, and moved it to the hash package that sits below Blob.
  • All Blob.writeTo() methods simply and consistently delegate to DigestUtil.computeDigest(). This resulted in removing BlobDescriptor.fromPipe(), which I think was not named or placed nicely, hose code is also duplicated some Blob.writeTo() method.
  • BlobDescriptor is merely a holder of (size, digest).

Another thing to note is that, there's no toBlob() or toBlobDescriptor() functions anymore after #1647 and this. All blobs are constructed through Blobs.from().

@chanseokoh
Copy link
Member Author

Finally, fixed the MacOS failure. Ready for review.

@loosebazooka
Copy link
Member

can you rebase or resolve conflicts on this or whatever? I remember you saying it included changes from the parent PR?

Copy link
Member

@loosebazooka loosebazooka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like WritableContents is only used in Digests which is itself internal, does that mean writable contents is unnecessary? Am I missing something?

Unrelated but I'm still confused: are we also calculating digests for raw strings (not from streams)? It seems our primary mechanism for digest generation is through an inputstream mechanism, but it could be easier for non-streams to use MessageDigest directly?

* @throws IOException if reading from or writing fails
*/
public static BlobDescriptor computeDigest(
WritableContents contents, OutputStream optionalOutStream) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple questions about optionalOutStream

  • is it truly optional? Even calls not using it still pass in a nulloutputstream.
  • should we use OutputStream instead of OutStream

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name "optional" is there only because of the name "computeDigest" doesn't convey the message that it can additionally copy WritableContents into the out stream while computing the digest. Like, from the point of a user just trying to compute a digest, they won't get why they have to pass an output stream for what.

Maybe I should just rename the method to "computeDigestAndCopy", as I said in the meeting? WDYT?

For outStream, I don't think anyone can confuse that with something else than a stream for output, but if you are averse to that, I can go with outputStream.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not so much confusion as much as consistency for outStream vs outputStream, so not a big deal.

I think we can actually deal with the optionalOutStream in a follow up PR.

@loosebazooka
Copy link
Member

So calculating the digest effectively consumes the stream? Is the purpose of copying the stream to reuse it somewhere else?

@chanseokoh
Copy link
Member Author

It looks like WritableContents is only used in Digests which is itself internal, does that mean writable contents is unnecessary? Am I missing something?

Blob actually depends on it. That is, a Blob instance can be created from WritableContents. The instance is WritableContentsBlob actually.

Unrelated but I'm still confused: are we also calculating digests for raw strings (not from streams)? It seems our primary mechanism for digest generation is through an inputstream mechanism, but it could be easier for non-streams to use MessageDigest directly?

Ah, TIL we can use MessageDigest directly for raw strings. Currently, there is one or two places where we calculate a digest from a raw String in our prod code. Another closest one is calculating from a JsonTemplate. For that we may be able to utilize MessageDigest, after converting a JsonTemplate to a String. The current implementation without MessageDigest still works though.

So calculating the digest effectively consumes the stream? Is the purpose of copying the stream to reuse it somewhere else?

Yeah, if data is coming from an input stream, in general we have to consume it, as we have no other choices (unless the input stream is resettable like a ByteArrayInputStream.) Copying the steam is like a by-product, but I admit it is also the other primary task required in Blob.writeTo(), which is effectively Digests.computDigest(in, outStream), i.e., writing the blob contents to outStream while returning its digest. This may further support the idea to rename the method to computeDigestAndCopy or copyAndComputeDigest.

@chanseokoh
Copy link
Member Author

Merging. We can follow up in later PRs.

@chanseokoh chanseokoh merged commit 62ab86b into master May 6, 2019
@chanseokoh chanseokoh deleted the cleanup-2 branch May 6, 2019 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants