-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify WriteRequest.write_offset
spec for compressed blobs
#193
Conversation
The CAS compression spec currently states that WriteRequest.write_offset is the offset in the uncompressed blob, but it does not match Google's implementation on the client or server side. The original intention was to encode each chunk, but having the entire stream compressed seems better also because it allows the reading-from-data-source-eg-disk and writing-to-CAS to happen concurrently.
|
I'm not sure which Google implementations you're referring to, but it being the offset into the uncompressed data is by design and correctly specified, and at least some Google implementations (e.g. RBE) get this right. This has been implemented incorrectly in a few cases, e.g. bazelbuild/remote-apis-sdks#273 fixed using the wrong offset in the SDK. Please flag anything still doing it wrong (internally if it's a non-public implementation) and we'll get those implementations fixed too.
Most CAS implementations don't support resumable uploads today so you can construct them however you wish (including streaming a compressed file from disk, compressing+concatenating chunks, or compressing on the fly), since you'll have to start over regardless if the stream gets broken. But yeah, it is a weakness of this scheme that resuming a partial upload of a large, compressed-on-disk file may require decompressing/recompressing the whole thing. That's a little unfortunate, but an uncommon enough corner case that I don't think it's worth solving for, and making the offsets be into the compressed data is problematic in its own right. Happy to expand if you need more information. |
https://github.com/bazelbuild/remote-apis-sdks/blob/40d1266c08bdd688d691a2e4090a01d498318030/go/pkg/chunker/chunker.go#L206 increases offset with the length of the compressed chunk. Then this offset is used as I am working on a faster implementation of uploader. I tried the following scheme
but the server responded with an error message about invalid offset. I can provide the code, but it will take me some time. |
Ahh. Hmm. I confused read and write there, but that was a mistake. In particular:
That's more awkward than I remembered :(. Apologies for the confusion! And more to the point, I only confirmed that the RBE read implementation matches the spec, but if you're telling me the write implementation does not, that's unfortunate but good to know. I'm...not actually sure what the right resolution is in this case. Making the write path use compressed offsets and the read path use uncompressed offsets would be easy I suppose, and I think more correctly matched to a strict reading of the ByteStream spec where it specifies the sum of the sizes of all data bundles sent on the stream. On the other hand, that's inconsistent with Read, and arguably problematic for anyone implementing resumable uploads - where it's not expected they'd commit the incoming bytes to storage byte-for-byte equally, and so wouldn't necessarily be able to handle resumptions of the form "the next bytes in a compressed stream, not aligned to any particular chunk boundary or interpretable without the exact bytes of the broken steam on hand". (Or to put it another way, I'm still pretty sure GetWriteStatus() should return the number of post-decompression bytes successfully persisted). "Correct" in the ideal case would be to have the write offset in the initial chunk be the offset in the uncompressed bytes, but for subsequent chunks to be specified in terms of the sum of the lengths of the |
Thinking about this has been quite a brain twist! But for the most part I concur with what Eric wrote. Any scenario where the subsequent Changing the first Additionally, the current implementation by RBE's client and server is technically adhering to the spec already. subsequent calls are already set to the first Of course, we could always ignore what the Bytestream specifies and establish our own expectations, but in that scenario we might as well write our own |
Also I'm not sure what you mean here? IIUC writing to CAS is sequential, and FWIW, the API doesn't specify anything regarding chunks for writes. The SDK uses chunks as an implementation detail. In any case, zstd has frames which are effectively another abstraction for chunks and compressed as a single unit. The SDK also already reads-from-disk and compress-data concurrently, though SDK specific discussions shouldn't happen on this thread. |
WriteRequest.write_offset
spec for compressed blobs
only in the first request.
Thanks. I've clarified the spec. PTAL. @rubensf you are right. The difference in the spec doesn't require reading and writing to be sequential. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This complexity is caused by Bytestream not quite fitting our use case exactly, and our desire for an incremental improvement to uncompressed Bytestream. For REAPIv3, we should IMO replace Bytestream with our own protocol (there are a few people probably insterested in working on replacements, but I don't know the time scale for REAPIv3 so maybe it's too early to start).
To summarise the reasoning behind the current design:
-
The initial write_offset must refer to an offset in the uncompressed blob, because we have no way to specify a specific compressed version of the blob. (In most cases, I expect server implementations to only support 0 here anyway.)
-
Subsequent write_offset values refer to the amount of compressed data received so far (plus the initial uncompressed write_offset), because the bytestream spec states it, but also because it would be less useful for the server to check the uncompressed write offsets. If we count compressed data here, the server can do some simple checks that it agrees with the client on how much compressed data has been sent. On the other hand if we were to count uncompressed data, the server might not be able to perform checks until all the data has been sent anyway (eg if a block-based compressor is used) in which case it could just decompress everything and check the size + hash of the uncompressed blob.
Change LGTM! Thanks Nodir. I wonder if we should go further and call out explicitly that this is nonsensical (adding uncompressed offset to compressed byte lengths gives...nothing meaningful) but done to fit the semantics of ByteStream, and/or that this will be split into distinct fields in V3 as we get off ByteStream, but that's optional. The text as written looks good, it's just going to reasonably make people question why it's defined so strangely :). |
This also sounds good to me. The implementation in bazel-remote simply ignores the field in subsequent requests because it isn't useful. |
Acked the nonsensical piece explicitly in the comment. |
lgtm, but I would remove |
done |
seems like everyone is happy? How do I get an official "LGTM"? Note that I don't have permissions to submit this CL. |
Thanks for approving, @mostynb! Folks, I don't have merging permissions. Please merge it for me. Thanks. |
The CAS compression spec currently states that WriteRequest.write_offset
is the offset in the uncompressed blob, but it does not match Google's
implementation on the client or server side.
The original intention was to encode each chunk, but having the entire
stream compressed seems better also because it allows the
reading-from-data-source-eg-disk and writing-to-CAS to happen
concurrently.