Dilated File Transfer (transfer-v2) #40

meejah · 2024-06-04T22:04:03Z

A version of #23 with master mrged in (i.e. md/rst flip) and additional copy-editing, clarifications and cleanup.

felinira

Thanks for picking this up again. I would really like to see this finally become reality. I've gone through the whole thing expect for the expansion excercises.

dilated-file-transfer.md

felinira · 2024-07-15T23:55:20Z

dilated-file-transfer.md

+Recall from the Dilation specification that subchannels are _record_ pipes (not simple byte-streams).
+That is, a subchannel transmits a series of complete (framed) messages (up to ~4GiB in size).
+
+For this protocol, each record on the subchannel uses the first byte to indicate the kind of message; the remaining bytes in any message are kind-dependant.


Why do we use a vastly different format here than in the control channel? I am especially confused about having to special case the first byte. We already use msgpack, so can't we just use a kind field as well?

Using a string here would make expansion and experiments much easier. We could specify to allow the X- prefix for experiments for example. As long as every msgpack-encoded message has a kind of type str we should be good to go.

MessagePack allows to serialize arbitrary schemaless data structures, we don't need to specify all valid keys beforehand.

Example for a file offer:

{ "kind": "FileOffer", "filename": "name", "timestamp": 1721088154 "bytes": 123 }

Example of binary data

{ "kind": "Data", "data": bin }

where bin would be a msgpack bin encoded stream.

The overhead for 64k blocks would be ~20 bytes, which is a loss of 0.03% of the available bandwidth. I would say this is negligible.

We could go further and just send binaries msgpack bin encoded (everything else has the map data type). This could IMO be enough.. Then we'd only have the msgpack bin format overhead to deal with, in this case with a byte array of 64k it would be 3 bytes instead of one, with the benefit of a clearer and more extensible data format that makes deserialisation easier.

Perhaps this needs some more concrete numbers / testing before finalizing.

The reasoning is that you have to pass all the bytes/memory through the msgpack codec then. With the "first byte" thing, you can efficiently (and predictably) chunk your payload data at 65KiB - 1 bytes on both encode and decode.

So when pulling data off, you'll bring "the message" into memory. Now you can simply look at the first byte, then write out the rest. With msgpack you'll end up allocating another 65KiB buffer for every message. (We could perhaps specify FlatBuffers here instead? But that has less widespread support and the zero-copy aspect is overkill for most of the messages ... )

Anyway, that's the thinking. I have not tested this, but I believe it would mean approximately twice as much memory pressure. (Perhaps actual msgpack libraries are more efficient than the api implies?)

I suppose -- similar to your reasoning above about the "mode" -- this could be left to a future extension if it is found to be substantially more efficient to do so. Overall, it seems unlikely that memory-allocator pressure will dominate runtime (although phone users might prefer less overhead). Again, completely untested :)

The reasoning is that you have to pass all the bytes/memory through the msgpack codec then.

Depends on the codec impl I guess. After reading the first 3 / 5 bytes it is enough to know that the rest will be binary data that can be read into the final buffer.

I don't have a strong opionion here though - if msgpack implementations make this difficult then by all means.

I do think however that we could still then just have one ID for "msgpack encoded informational map", instead of three.

Okay, yeah at least that simplification makes sense (e.g. "just data" versus "msgpack-encoded data" in which case it must have a "kind" field).

I'll at least do some tests and see if there's some obvious, large advantage. This could also be a "future enhancement" / exercise .

dilated-file-transfer.md

felinira · 2024-07-16T00:27:47Z

dilated-file-transfer.md

+
+Currently, a file-transfer peer that sends zero version data is assumed to be "classic".
+A file-transfer peer supporting Dilation and this new protocol sends `"transfer": {...}` as per the  "Version Negotiation" section above.
+We include a `"version"` key in that dict so that this transfer protocol may be extended in the future (see "Protocol Expansion Exercises" below).


We don't do this anymore. We only use features instead. However we could always even start doing at a later date in the future, as long as unknown keys get ignored.

felinira · 2024-07-16T00:40:00Z

dilated-file-transfer.md

+    files: list[str]   # a list containing relative paths for each file
+```
+
+The filenames in the `"files"` list are unicode relative paths (relative to the `"base"` from the `DirectoryOffer` and NOT including that part.


We should specify that using the .. directory (for example, but not limited to refering to paths above the directory tree) is a protocol error and MUST cause an immediate rejection of the offer by the client libary. Otherwise we allow this sort of behaviour and need to have applications themselves deal with this trickery.

dilated-file-transfer.md

felinira · 2024-07-16T00:42:39Z

dilated-file-transfer.md

+Sending a single file like `/home/meejah/Documents/Letter.docx` gets a filename like `Letter.docx`
+Sending a whole directory like `/home/meejah/Documents/` would result in a directory-offer with basedir `Documents` and some number of files (possibly with sub-paths).
+
+Discussion point: should the basedir simply be ".", and names relative to it? (That is, "Documents" is a possibly-sensitive local directory name).


This should just be left for client applications to decide.

As mentioned above, we must make sure to prevent illegal filenames that contain .. and illegal paths that are .. or that contain any / path separator.

Co-authored-by: Fina <code@felinira.net>

meejah · 2024-07-16T04:04:15Z

Thanks for picking this up again. I would really like to see this finally become reality. I've gone through the whole thing expect for the expansion excercises.

Great, thanks for taking a look! 🥳

FYI, I am (slowly!) making progress on turning the Python "proof-of-concept" code into something a bit closer to production / usable, to prove to myself that the document makes sense and isn't missing details.

Also, in case you missed it, I'm working on a tool called fowl so that non-Python applications can more easily and directly play with Dilation use-cases (whether that is this, or any other Dilation-based protocol).

…rotocols into file-transfer-v2

meejah added 15 commits June 13, 2022 13:36

prelim v2 sketch

aac7fc9

clarify

1f454ee

overall selection

efc0011

attempt at state diagram

58e32e5

tweak diagram

8240b1a

more tweaks

8e8985d

more version work

0e94de2

random notes

c8aaa0d

new writeup, incorporating extensive discussion from brian warner

bb917fe

clarify

75d1c05

Merge branch 'main' into file-transfer-v2

f4ebcd2

slight re-word, 1 -> 'text'

59231d3

reword

a702aed

more clarity around versions

23dd05b

clarify

96f27c5

meejah changed the title ~~Dilated File Transfer (transfer-vs)~~ Dilated File Transfer (transfer-v2) Jun 4, 2024

meejah added 8 commits June 4, 2024 17:24

clarify

9040eb8

decide: msgpack

4fc5832

more clarity

a4552c1

decide: only features

0849bf1

decide: file naming

f7aff0e

reword

980d273

basic -> core0

9e3b9cd

start on compression, other cleanup

553f21b

felinira suggested changes Jul 16, 2024

View reviewed changes

meejah and others added 2 commits July 15, 2024 21:41

Clarify tolerance

6c2c112

Co-authored-by: Fina <code@felinira.net>

spelling

cd531c6

Co-authored-by: Fina <code@felinira.net>

meejah added 2 commits July 22, 2024 11:57

spelling

9fa1847

remove 'permission' mode

d1be2fa

meejah added 3 commits July 22, 2024 11:58

Merge branch 'file-transfer-v2' of github.com:meejah/magic-wormhole-p…

fcb3e68

…rotocols into file-transfer-v2

remove 'ask'

267c649

clarify file naming further

cfe5159

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dilated File Transfer (transfer-v2) #40

Dilated File Transfer (transfer-v2) #40

meejah commented Jun 4, 2024 •

edited

Loading

felinira left a comment

felinira Jul 15, 2024 •

edited

Loading

meejah Jul 16, 2024

felinira Jul 16, 2024

meejah Jul 16, 2024

felinira Jul 16, 2024

felinira Jul 16, 2024

felinira Jul 16, 2024

meejah commented Jul 16, 2024

Dilated File Transfer (transfer-v2) #40

Are you sure you want to change the base?

Dilated File Transfer (transfer-v2) #40

Conversation

meejah commented Jun 4, 2024 • edited Loading

felinira left a comment

Choose a reason for hiding this comment

felinira Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

meejah Jul 16, 2024

Choose a reason for hiding this comment

felinira Jul 16, 2024

Choose a reason for hiding this comment

meejah Jul 16, 2024

Choose a reason for hiding this comment

felinira Jul 16, 2024

Choose a reason for hiding this comment

felinira Jul 16, 2024

Choose a reason for hiding this comment

felinira Jul 16, 2024

Choose a reason for hiding this comment

meejah commented Jul 16, 2024

meejah commented Jun 4, 2024 •

edited

Loading

felinira Jul 15, 2024 •

edited

Loading