Allow feature detection for `io.ReaderAt` on Bytes nodes #372

willscott · 2022-03-01T19:23:50Z

We can handle large bytes by allowing Bytes kinded nodes to optionally support
the ReadAt method thereby implementing the ReaderAt interface.

This is proposed over a distinct the reader / seeker implementation because the ReaderAt interface is stateless - it does not need the addition of a position cursor to be tracked in the node between calls, while still allowing for partial reads of the node contents.

… nodes We can handle large bytes by allowing Bytes kinded nodes to optionally support the `ReadAt` method thereby implementing the ReaderAt interface. This is proposed over a distinct the reader / seeker implementation because the ReaderAt interface is stateless - it does not need the addition of a position cursor to be tracked in the node between calls, while still allowing for partial reads of the node contents.

willscott · 2022-03-01T19:26:23Z

Work i see in this repo for fully implementing this proposal if it gets general approval:

Add comment about optional support for the interface at the main Node AsBytes method
Add ReadAt to the bind node implementation
Make use of the interface when available in codec Encoding

willscott · 2022-03-01T19:31:33Z

In our one encounter with large bytes so far, we chose io.Reader and kept an extra offset for state in the bytes. I think this proposal is primarily asking us to decide if we're okay with a fast path support for the ReadAt method, which seems like plausibly the most common interface that would be possible to support.

mvdan · 2022-03-01T19:41:12Z

Before: #319

I lean towards ReadSeeker over ReaderAt because io.Reader is fairly common, and because ReaderAt doesn't give any information about the total number of bytes - with seeking, you can seek all the way to the end to find out. Note that, currently, the Length method isn't valid for Bytes nodes.

On the flip side, ReaderAt is slightly easier to implement, and one could always implement an io.Reader atop an io.ReaderAt. I think I see your point about altering state when reading/seeking as a potential problem, but beyond "IPLD nodes are immutable", is that really a problem? It wouldn't affect what the node would contain or how it would be encoded; one could always seek back to the beginning.

For the sake of not keeping the read/seek offset state as part of the node itself, that's why in the other thread I suggested AsLargeBytes() (io.ReadSeeker, error). The other advantage of this mechanism is that an implementation can support two concurrent readers.

There is one other thing to keep in mind about ReaderAt:

Clients of ReadAt can execute parallel ReadAt calls on the same input source.

This would mean that any implementation, including ADLs, would need to support concurrent calls.

mvdan · 2022-03-01T19:44:12Z

I should also warn against having a node optionally implement an io interface directly, without a method to obtain that implementation instead. With AsLargeBytes() (io.ReadSeeker, error), you can decide at run time whether or not you're really "large bytes" or not, whereas you must decide what the set of methods on a type is at compile time. This matters for any and kinded unions; AsLargeBytes will be defined for those, but will error if the kind isn't actually bytes. You could potentially emulate this by having ReadAt calls return a sentinel error, but then checking whether a node value or type implements io.ReaderAt is almost pointless, as it doesn't give you a good answer.

willscott · 2022-03-01T19:49:27Z

Having to calculate full size may also be expensive, it would be nice to be able to support offset reads without needing to sign up for that (maybe you don't support seek to the end, i suppose)
The point on the clearly defined method makes sense. An AsLargeBytes seems reasonable.
Do we not think supporting concurrent calls is desirable / causes undue burden versus read seeker?

mvdan · 2022-03-01T19:57:41Z

(maybe you don't support seek to the end, i suppose)

Right, you could indeed error when asked to seek relative to the end.

Do we not think supporting concurrent calls is desirable / causes undue burden versus read seeker?

With the understanding that we need an AsT method either way, the difference is that ReaderAt requires concurrent calls to work, whereas ReadSeeker doesn't. Given that most users would presumably grab their own "reader" via the method, and not share it with other goroutines, then I imagine requiring concurrency would be unnecessary, and that requirement could make implementing a "big bytes ADL" that loads on-demand harder - one suddenly needs to worry about data races :)

All this said, it's still just a "leaning towards ReadSeeker" for me and not a strong feeling. I think my only ask is that, if we do go for ReaderAt, then we should leave space for implementers to provide the total length, such as by extending the existing Length method.

willscott · 2022-03-01T20:13:48Z

I guess the main candidate here at the moment is the unixfsnode ADL. I think i see what a read-seeker would look like there, so i'll prototype that library as a candidate implementation next.

warpfork · 2022-03-01T20:22:04Z

I have read this, and believe you two have more detailed understanding than I've got loaded anywhere near to my "hot cache", and I trust you :)

This sounds broadly directionally correct to me. Between the two specific interfaces I don't have a strong opinion.

FWIW, having one alloc to return whatever-interface-it-is sounds probably acceptable to me. Iterators already have that same cost threshhold in most implementations. If we wanted to optimize heavily: for both iterators, and for this, we could embed one of the thing in the main structure, wrap it in some sync primitives, and do allocs iff more than one is requested -- at the cost of larger resident memory. That hasn't seemed necessary yet, but seems plausible as future work.

mvdan · 2022-03-01T20:27:38Z

That sounds reasonable, but I also think that one alloc to be able to consume a "big bytes" node will likely be negligible. At that point, you're going to be worrying about bigger factors like copying the bytes or reusing buffers correctly.

willscott · 2022-03-02T18:31:43Z

it's going to be hard to add AsLargeBytes to the core ipld.node interface because it'll gate updates of ipld-prime to a release where all derived gen'd code has been re-built to include the additional method, right?

mvdan · 2022-03-02T21:14:55Z

We could declare it as an optional interface either for some time, or forever. I think either of those will work as long as "imlements the interface" is not understood as "can be used as large bytes". But in any case, we can start with declaring the extra interface for now, and later decide if enough implementations have transitioned to it that we can add the method straight to Node. For example:

type LargeBytesNode interface {
    Node
    AsLargeBytes() (io.ReadSeeker, error)
}

willscott · 2022-03-03T11:06:29Z

the downside is that there are two checks on each access - first a check of whether the interface is implemented, and then the error check when attempting to get the io.Reader.
That said, this seems simplest for an incremental rollout

mvdan · 2022-03-03T11:08:31Z

Indeed, so I'm inclined to believe that we could move the method to ipld.Node sometime down the line - perhaps in six months or so.

willscott · 2022-03-03T12:06:59Z

@mvdan ready for review. Should we also think about a way to make a bytes node from an io.Reader as an extension to the builder side API?

I'm inclined to defer for now, but it seems useful to figure out for efficiency of copies at some point.

mvdan · 2022-03-03T12:10:05Z

I am happy to defer on that too, for now. I think building "large bytes" nodes will fall into two categories:

The user doesn't care about scalability or efficiency. They can use SetBytes.
The user does care about scalability, so they will use some ADL depending on their needs, which will have its own efficient and tailored "build" API.

mvdan

nit: the docs talk about an io.Reader when it's actually ReadSeeker

mvdan · 2022-03-03T12:11:20Z

We could also document that implementations may refuse to support certain kinds of seeks, like SeekEnd.

willscott requested a review from warpfork March 1, 2022 19:23

willscott requested a review from mvdan March 1, 2022 19:26

Add AsLargeBytes optional interface

319182f

mvdan approved these changes Mar 3, 2022

View reviewed changes

improved doc

501557e

willscott merged commit c6f8188 into master Mar 3, 2022

willscott deleted the feat/bytereaderat branch March 3, 2022 12:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow feature detection for `io.ReaderAt` on Bytes nodes #372

Allow feature detection for `io.ReaderAt` on Bytes nodes #372

willscott commented Mar 1, 2022

willscott commented Mar 1, 2022

willscott commented Mar 1, 2022

mvdan commented Mar 1, 2022 •

edited

Loading

mvdan commented Mar 1, 2022

willscott commented Mar 1, 2022

mvdan commented Mar 1, 2022 •

edited

Loading

willscott commented Mar 1, 2022

warpfork commented Mar 1, 2022

mvdan commented Mar 1, 2022

willscott commented Mar 2, 2022

mvdan commented Mar 2, 2022

willscott commented Mar 3, 2022

mvdan commented Mar 3, 2022

willscott commented Mar 3, 2022

mvdan commented Mar 3, 2022

mvdan left a comment

mvdan commented Mar 3, 2022

Allow feature detection for io.ReaderAt on Bytes nodes #372

Allow feature detection for io.ReaderAt on Bytes nodes #372

Conversation

willscott commented Mar 1, 2022

willscott commented Mar 1, 2022

willscott commented Mar 1, 2022

mvdan commented Mar 1, 2022 • edited Loading

mvdan commented Mar 1, 2022

willscott commented Mar 1, 2022

mvdan commented Mar 1, 2022 • edited Loading

willscott commented Mar 1, 2022

warpfork commented Mar 1, 2022

mvdan commented Mar 1, 2022

willscott commented Mar 2, 2022

mvdan commented Mar 2, 2022

willscott commented Mar 3, 2022

mvdan commented Mar 3, 2022

willscott commented Mar 3, 2022

mvdan commented Mar 3, 2022

mvdan left a comment

Choose a reason for hiding this comment

mvdan commented Mar 3, 2022

Allow feature detection for `io.ReaderAt` on Bytes nodes #372

Allow feature detection for `io.ReaderAt` on Bytes nodes #372

mvdan commented Mar 1, 2022 •

edited

Loading

mvdan commented Mar 1, 2022 •

edited

Loading