Enable content checksums #4375

ckamm · 2016-01-20T09:57:26Z

Due to #3235 and #2983 we use content checksums for detecting unnecessary uploads and renames-that-aren't. But content checksums are currently only computed for .eml files.

We should consider enabling content checksumming for everything.

guruz · 2016-01-20T10:01:16Z

👍

I still vote for using a safe cryptograpgical checksum so we can later use it for more advanced magic.
But if @ckamm says we can anyway later switch...

@LukasReschke @ogoffart What do you think?

dragotin · 2016-01-20T10:01:51Z

@moscicki your input here?

guruz · 2016-01-20T10:06:50Z

Maybe also from @PVince81 and @DeepDiver1975 about storage backends (on the server) natively supporting certain checksums so we can align with them.

ckamm · 2016-01-20T10:35:01Z

Discussion on IRC regarding checksum type:

Performance of checksumming will rarely matter compared to the cost of uploads and downloads, we can use SHA1 or better.
We should choose a content checksum type that's supported by the server as transmission checksum. Otherwise we might have to compute two checksums.
The content checksum could be computed while we're reading the file into memory anyway, instead of as a separate pass.
The transmission checksum is currently part of the header of the last chunk's PUT request and must thus be computed in advance.

ckamm · 2016-01-20T11:46:07Z

Other jobs that need to be done to expand our use of checksums:

Does the 'transmission checksum' vs 'content checksum' distinction make sense? Currently the server only sees the 'transmission checksum' and we discard the checksum we get from the server because the journal only stores the 'content checksum'.
Ensure we have a content checksum for downloaded files: either use the checksum the server sends or compute it after downloading.

moscicki · 2016-01-20T15:40:45Z

Please consider that some storage backends such as ours already have a preferred checksum (adler32 in our case). It may be impossible for us to change the checksum type. More precisely, we have a technical ability to change the checksum type (for new files) but we may be constrained to do so by other considerations/policies. And also, consider that currently all our files are already checksummed with adler32 so any checksum type change could only apply to new files (so we would have a mix of checksum types in the system).

The adler32 checksum is not efficient for change detection for small files. Hence using it as content checksum may be problematic for files smaller than a given size (say 1KB). Also, consider this especially if you think in the future of storing checksum blocks which are by definition rather small. So if we want to avoid to compute additional content checksum then the content checksum mechanism with adler32 should be only enabled for files above the critical size (meaning that below that size you should count with a possibility of collision and false-positive). So that would option one but it is not optimal because it would not work for very small files (which I guess defeats the purpose for *.eml files).

So I think at least for us the distinction between content and transmission checksum makes sense. What is currently called a transmission checksum is actually enforced by the server and client cannot change it (another example: MD5 for Amazon S3). What is called content checksum is actually what client wants to use locally to achieve better operation with local change detection. In some cases both can be aligned, in other cases they cannot.

If we wanted to add additional content checksum than I would suggest:

on PUT: also send it as additional special header (e.g. OC-Client-Content-Checksum) just in case server decided to make use of this information in some way (or just store it as additional metadata). I would only consider to add this header unless you think that it is a better implementation choice to avoid the second pass on the client.
on GET: if transmission checksum does not match the type you expect you should compute the content checksum yourself and store locally. You may want to consider to store also the transmission checksum in case some correlation between the client and server checksums would be needed. This checksum however, is not normally needed by the client and should be included in a way which does not increase the memory footprint of the client (if that's possible at all)

I am open to discuss it further, these are just a bunch of initial thoughts but the initial constraints I described are valid.

AnrDaemon · 2016-01-24T19:02:42Z

A wishful note: Using two differently calculated checksums against same file often increase reliability.

dragotin · 2016-02-25T12:48:52Z

Hm, I got pointed to this: https://tools.ietf.org/html/rfc3230
Maybe we should be compatible?

dragotin · 2016-02-29T09:57:22Z

Created owncloud/core#22711 about this.

ckamm · 2016-03-02T13:30:39Z

Pull request #4532

moscicki · 2016-03-02T14:22:19Z

Hello,

In 1.7.2 we have the content checksums implemented and enabled. Will this pull request affect the current transmission checksum implementation?

Thank you,

kuba

ckamm · 2016-03-03T09:46:24Z

@moscicki This has no effects on transmission checksums, it's only about checksums that the client stores in its database.

Enable content checksums #4375

guruz · 2016-04-06T15:14:01Z

-> #4638

Cherry-picked from d6d3502

ckamm added the Enhancement label Jan 20, 2016

ckamm self-assigned this Jan 20, 2016

ckamm added this to the 2.2-next milestone Jan 20, 2016

ckamm mentioned this issue Jan 20, 2016

Check checksums or file contents before assuming a file was moved (instead of just mtime+inode) #2983

Closed

tflidd mentioned this issue Jan 23, 2016

Don't re-download/re-upload manually copied files #3422

Closed

ckamm mentioned this issue Feb 22, 2016

Checksums not stored in Journal #4487

Closed

guruz assigned dragotin and unassigned ckamm Feb 23, 2016

ckamm assigned ckamm and unassigned dragotin Mar 1, 2016

ckamm added a commit to ckamm/owncloud-client that referenced this issue Mar 2, 2016

Checksums: Content checksums for all uploads owncloud#4375

7ed7512

ckamm added a commit to ckamm/owncloud-client that referenced this issue Mar 2, 2016

Checksums: Compute content checksum on download owncloud#4375

d6d3502

ckamm added a commit that referenced this issue Mar 14, 2016

Merge pull request #4532 from ckamm/content-checksum

25baa99

Enable content checksums #4375

ckamm added the ReadyToTest QA, please validate the fix/enhancement label Mar 14, 2016

dragotin mentioned this issue Mar 19, 2016

Use checksums before creating conflict file #4587

Closed

This was referenced Apr 28, 2016

Don't skip uploads when checksum is unchanged #4754

Closed

Allow metadata-only uploads and downloads #4755

Open

ckamm closed this as completed Apr 28, 2016

guruz pushed a commit that referenced this issue May 20, 2016

Checksums: Compute content checksum on download #4375

9aed8db

Cherry-picked from d6d3502

dragotin mentioned this issue Jul 12, 2016

FR: Checksums to prevent uploads with corrupted files owncloud/ios-legacy#719

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable content checksums #4375

Enable content checksums #4375

ckamm commented Jan 20, 2016

guruz commented Jan 20, 2016

dragotin commented Jan 20, 2016

guruz commented Jan 20, 2016

ckamm commented Jan 20, 2016

ckamm commented Jan 20, 2016

moscicki commented Jan 20, 2016

AnrDaemon commented Jan 24, 2016

dragotin commented Feb 25, 2016

dragotin commented Feb 29, 2016

ckamm commented Mar 2, 2016

moscicki commented Mar 2, 2016

ckamm commented Mar 3, 2016

guruz commented Apr 6, 2016

Enable content checksums #4375

Enable content checksums #4375

Comments

ckamm commented Jan 20, 2016

guruz commented Jan 20, 2016

dragotin commented Jan 20, 2016

guruz commented Jan 20, 2016

ckamm commented Jan 20, 2016

ckamm commented Jan 20, 2016

moscicki commented Jan 20, 2016

AnrDaemon commented Jan 24, 2016

dragotin commented Feb 25, 2016

dragotin commented Feb 29, 2016

ckamm commented Mar 2, 2016

moscicki commented Mar 2, 2016

ckamm commented Mar 3, 2016

guruz commented Apr 6, 2016