Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp of SharedArrayBuffer PR #2518

Merged
merged 1 commit into from
Apr 28, 2017
Merged

Revamp of SharedArrayBuffer PR #2518

merged 1 commit into from
Apr 28, 2017

Conversation

annevk
Copy link
Member

@annevk annevk commented Apr 10, 2017

This is a new take on #2361 based upon StructuredSerialize and StructuredDeserialize.

I rebased @domenic’s work and then did a fixup in a new commit.

For failure StructuredDeserialize throws. (Still need to update some callers. The previous PR is also a good source for tests that are still lacking.)

@annevk
Copy link
Member Author

annevk commented Apr 10, 2017

Todo:

  • Rename directly share-to-able event loops to "directly-shared event loops"
  • Rename unit of share-within-able event loops to "shared event loops"
  • Revise definition of directly share-to-able event loops: "responsible browsing context" is never null for dedicated and shared workers. This is probably a big problem and requires introduction of some kind of hierarchy. There's no way you can distinguish Window -> Dedicated -> Shared -> Dedicated from Window -> Dedicated -> Dedicated, Window -> Shared for instance, somewhat by design. (Not sure if implementations still restrict creating a shared worker inside a dedicated worker.)
  • [ ] Make sure all the right places expose onmessageerror

@domenic do you want to preserve the first two separate commits? Might have to do some trickery to apply the above changes then.

@annevk
Copy link
Member Author

annevk commented Apr 10, 2017

Details of workers in Firefox:

  • Each worker has a parent, which is either a window object or another worker.
  • The parent of a shared or service worker can only be a window object.
  • Workers also have children and type, which are concepts the standard already has.

@domenic
Copy link
Member

domenic commented Apr 10, 2017

No need to preserve the first two separate commits. It'd be vaguely nice if things could land separately, e.g. remove StructuredCloneWithTransfer, define agents and agent clusters, define SAB structured cloning, but it's not necessary. I've certainly tried for separate commits and failed on similarly-large projects.

@annevk
Copy link
Member Author

annevk commented Apr 11, 2017

I've split some of this out into #2520 and #2521.

To do StructuredCloneWithTransfer separately I think we'd have to tightly couple it with messageerror. Otherwise you end up changing the same algorithms everywhere (and in service workers) twice. I also think we'd have to do it before we land the SharedArrayBuffer bits. The alternative is coupling all these together. If you can make a suggestion I can work on it tomorrow, provided the other two PRs are in reasonable shape.

@inexorabletash
Copy link
Member

@binji pointed me here - just wondering if throwing when attempting to store a SAB to IndexedDB falls out of this PR or if we need to tackle that separately.

@annevk
Copy link
Member Author

annevk commented Apr 12, 2017

@inexorabletash no, we lost that property when we stopped requiring it to be a transferable. I suspect we might need StructuredSerializeWithoutSharedArrayBuffer, also for the Notifications API, which has similar constraints as Indexed DB (also keeps items for storage). @domenic what do you think? Requiring StructuredSerialize callers to traverse the record seems unnecessarily complicated.

@annevk
Copy link
Member Author

annevk commented Apr 12, 2017

The above is assuming btw that [AllowShared] would be implied for any. An alternative might be to require it even then, since it's a bit of a footgun.

@domenic
Copy link
Member

domenic commented Apr 12, 2017

So IMO attempting to send a SharedArrayBuffer to IDB should work fine. It should snapshot it at that time, as part of the IDB-specific records-in-memory-to-bytes-on-disk process. As a web developer I think that would be more expected than requiring me to create a copy of my object graph with each SAB replaced by an AB whenever I want to store something on disk.

But let's say we want to prohibit it, for the purpose of this discussion. In that case I think it's part of a more general problem, where certain things aren't storable on disk. The obvious upcoming example is a stream.

Again to me the best layering for this sort of thing is for it to be part of IDB's records-in-memory-to-bytes-on-disk process. That just makes the most sense from my perspective, as it's IDB-specific. Maybe we can help IDB out by defining some abstract op like "CanSerializationBeWrittenToDisk" that crawls the record graph for you?


The normative question at hand, BTW, is whether serializing the JS object: { get x() { ... }, y: unDiskable, get z() { ... } } invokes zs getter. If we do the validation as a second pass, z's getter will be invoked. If we do it interleaved with the StructuredSerialize operation, then z's getter will not be invoked.

@inexorabletash has previously expressed off-thread that it fits better with Blink's architecture to not invoke z's getter, i.e. to do the validation while the object graph is being crawled and converted into its serialized form. I'm OK with that if other implementers also agree. I think it's less elegant layering, as it requires creating some IDB-specific variant of StructuredSerialize and intertwingling the concerns of IDB's disk serialization with StructuredSerialize's realm-independent serialization. But I don't object if that's what the implementers want; the theoretical purity isn't worth much.

@annevk
Copy link
Member Author

annevk commented Apr 13, 2017

Storage question also affects the Notifications API at least, which does some limited form of storage.

@domenic
Copy link
Member

domenic commented Apr 13, 2017

I guess I am coming around to StructuredSerializeWithoutSharedArrayBuffer, as long as it is generic (e.g. has a name like StructuredSerializeForStorage) and ideally is used by at least two consumers.

I don't quite understand why Notifications needs this as I thought it was all passing data around in memory. It certainly doesn't seem like Notifications are stored on disk. In particular the Notifications case seems pretty much identical to the pushState case. (Which is a good test to write for SABs... let me file a web platform tests)

@annevk
Copy link
Member Author

annevk commented Apr 13, 2017

You can create a notification from a service worker with serializable data, then that agent cluster can be collected, and the browser could even be closed (and store the data to disk) and then when the user clicks the notification you'd need to deserialize it, maybe from disk. And even if the disk isn't involved it wouldn't have to be the same instance of the service worker, meaning it's a new agent cluster.

@domenic
Copy link
Member

domenic commented Apr 13, 2017

Hmm, I see! Thanks, I didn't understand how data really worked.

@domenic
Copy link
Member

domenic commented Apr 13, 2017

So I still think we need to decide, for both Notifications and IDB, whether it snapshots the data or errors at serialization time. Snapshotting makes a lot more sense to me, but nobody seems sure... I guess we could always be conservative and let it fail for now, and see if users get upset about having to crawl their object graph and create a copy with all SABs replaced by AB snapshots...

@annevk
Copy link
Member Author

annevk commented Apr 13, 2017

The main reason no serialization error feels icky is because writing to disk is effectively crossing the Agent Cluster boundary (same if you pass shared memory to the network). And whenever you normally cross the Agent Cluster boundary we decided on failure. We could also do what some implementations do and hand out an AB when you try to cross a Agent Cluster boundary with an SAB, but we kinda already decided against that.

@domenic
Copy link
Member

domenic commented Apr 13, 2017

To me the disk is qualitatively different from the in-memory agent cluster boundaries. Asking to serialize something to IDB is pretty clearly asking for you to put the bytes on the disk. That's is qualitatively different from sending the memory to another window/worker. I doubt developers even know that they are both backed by similar spec/implementation infrastructure.

Putting it another way, when a developer says "put these bytes on disk" it's pretty surprising to be told "no, these are special shared bytes, we don't put those on the disk". For putting-on-disk purposes, they're just bytes, IMO!

The notifications case is less straightforward, as it certainly wasn't obvious to me that it was more like IDB than like postMessage. So my intuition doesn't help there.

@annevk
Copy link
Member Author

annevk commented Apr 13, 2017

It seems very natural to me, but either way they will need a different code path. These are the options we have I think:

  1. Throw.
  2. Copy the internals into an AB.
  3. Copy the internals into a new SAB (and by this I mostly mean a new Shared Data Block).

@domenic
Copy link
Member

domenic commented Apr 13, 2017

OK, trying to pick between those made me side with (1). In addition to the argument-from-conservativism, the idea that Notifications and IDB should be consistent clinched it. Because neither (2) nor (3) is very good for Notifications. Both are surprising for developers, I would think.

FWIW @binji and @inexorabletash were leaning toward (1) from what I understand, and Firefox already implements (1), so I think we should be good to move forward with it.

Thanks again for walking me through all this.

@annevk
Copy link
Member Author

annevk commented Apr 13, 2017

I decided to split out messageerror into its own PR: #2530. We can still land it together with this or shortly after each other, but it seems far better to get it reviewed separately.

@annevk annevk added the do not merge yet Pull request must not be merged per rationale in comment label Apr 13, 2017
@annevk
Copy link
Member Author

annevk commented Apr 13, 2017

I will rebase this PR once we've made some progress on the other PRs and maybe landed one or two. In particular this probably needs to be on top of the messageerror PR due to the addition of the new StructuredSerializeForStorage abstract operation, but it also needs to be on top of the other PRs and it doesn't quite seem worth the effort to start cherry picking to make all that possible as the foundations are not at the nit stage just yet.

@inexorabletash
Copy link
Member

Thank you both for your care and attention on this.

@domenic
Copy link
Member

domenic commented Apr 18, 2017

On today's WebAssembly call we briefly discussed the pushState/replaceState case and thought that should probably use StructuredSerializeForStorage as well, both to be conservative and because it seems at least feasible implementations could serialize history state to the disk during navigation.

@annevk
Copy link
Member Author

annevk commented Apr 18, 2017

That sounds like a good plan.

@annevk
Copy link
Member Author

annevk commented Apr 18, 2017

We do need to make sure sub-serialization passes through the forStorage boolean.

Wait, this already happens, but I guess you mean to the "serialization steps" as stated earlier.

@annevk
Copy link
Member Author

annevk commented Apr 18, 2017

But also, if we change "serialization steps", everyone that has integrated with the current calling pattern will have to change again. Perhaps introducing "serialization for storage steps" isn't so bad?

@domenic
Copy link
Member

domenic commented Apr 18, 2017

Oh sorry, I hadn't read through the actual changes yet :-/.

I agree it's a bit confusing whether people who have integrated with the current pattern for declaring serialization steps need to declare the third argument if they don't use it. E.g. in JavaScript they would not, but in most strongly-typed languages they would.

Maybe "serialization for storage steps" is the easiest way out of that confusion.

@annevk
Copy link
Member Author

annevk commented Apr 18, 2017

It also makes more sense as the second argument I thought. Hmm.

@annevk
Copy link
Member Author

annevk commented Apr 19, 2017

I decided to make forStorage a third argument to "serialization steps" after all and recommend it be omitted if it's not relevant to the algorithm at hand. This seemed better than requiring platform objects to define a set of steps labeled "for storage" while the dominant usage of such an algorithm would not in fact be "for storage".

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took me a while to get around to reviewing this

@@ -7643,6 +7667,9 @@ interface <dfn>DOMStringList</dfn> {
<p>These steps may perform a <span>sub-serialization</span> to serialize nested data
structures. They should not call <span>StructuredSerialize</span> directly, as doing so will
omit the important <var>memory</var> argument.</p>

<p>The introduction of these steps should omit mention of the <var>forStorage</var> argument if
it is not relevant to the algorithm.</p>
</dd>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a new idea. Why don't we say that platform objects may define themselves as "not serializable for storage", in addition to their serialization/deserialization steps? And explain when you might want to do that. Then we just throw a DataCloneError if that is set for a platform object.

That would help enforce consistent errors, for example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a parameter, it would let WebAssembly.Module define the "throw if serializing for storage if not in a secure context" behavior we've discussed elsewhere (rather than doing that in IDB as a post-serialize pass). Although maybe we can do that with prose just as easily. "A module is not serializable for storage if the context is not a secure context."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting; I am now leaning back toward a parameter. The prose version isn't great, because of the question "what is the context"? Which indeed is a good point we'll need to be sure to write tests on. Consider the case

// inside window1, so object literals/functions have a relevant realm of window1
window2Store.put({
  get x() { return new window3.WebAssembly.Module(...); }
});

which context(s) here need to be secure?

The only way "not serializable for storage" would work is if we decide window3 is the one that needs to be secure; then we could say "A module is not serializable for storage if its relevant settings object is not a secure context". For window2 we'll need an algorithm so we can correctly refer to "the current settings object".

(window1 isn't easy to get at in specs, and probably not in implementation either, so we can forget that.)

We could also require some combination of these, e.g. both window2 and window3. Again this is easier with algorithm steps.


Given this complexity, for now I am happy with either:

  • Abandoning revisions to serialization steps alone in this PR and working on a follow-up
  • Going with the PR as-is, with an optional forStorage parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current PR is fine. It's flexible which is what we need until we have patterns.

source Outdated
allocation failure.</p>
<li><p>Set <var>serialized</var> to { [[Type]]: "SharedArrayBuffer", [[ArrayBufferData]]:
<var>value</var>.[[ArrayBufferData]], [[ArrayBufferByteLength]]: <var>size</var>,
[[AgentCluster]]: <span>current Realm Record</span>'s corresponding <span>agent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the" current Realm Record's...

source Outdated
<li><p>Let <var>size</var> be <var>value</var>.[[ArrayBufferByteLength]].</p></li>

<li>
<p>Let <var>dataCopy</var> be ? <span>CreateByteDataBlock</span>(<var>size</var>).</p>
<p>If <span>IsSharedArrayBuffer</span>(<var>value</var>) is true, then:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably want a ! here per your recent changes?

source Outdated
<li><p>Set <var>serialized</var> to { [[Type]]: "ArrayBuffer", [[ArrayBufferData]]:
<var>dataCopy</var>, [[ArrayBufferByteLength]]: <var>size</var> }.</p></li>
<ol>
<li><p>If <span>IsDetachedBuffer</span>(<var>value</var>) is true, then throw a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another missing !

data-dfn-type="abstract-op"><dfn>StructuredSerialize</dfn> ( <var>value</var> [ ,
<var>memory</var> ] )</h4>
<h4 id="structuredserializeinternal" data-noexport="" data-lt="StructuredSerializeInternal"
data-dfn-type="abstract-op"><dfn>StructuredSerializeInternal</dfn> ( <var>value</var>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could omit all the Bikeshed stuff here since we're not exporting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I'd leave it in for when we get that public/private thing fixed.

<li><p>Let <var>serialized</var> be ? <span>StructuredSerialize</span>(<var>input</var>,
<var>memory</var>).</p></li>
<li><p>Let <var>serialized</var> be ? <span>StructuredSerializeInternal</span>(<var>input</var>,
false, <var>memory</var>).</p></li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just keep calling StructuredSerialize here; seems slightly nicer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How? StructuredSerialize no longer accepts memory.

@@ -8611,6 +8680,7 @@ o.myself = o;</pre>
</dd>

<dt><span>StructuredSerialize</span></dt>
<dt><span>StructuredSerializeForStorage</span></dt>
<dt><span>StructuredDeserialize</span></dt>
<dd>
<p>Creating a <span>JavaScript Realm</span>-independent snapshot of a given value which can be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a paragraph after this explaining the distinction. Something like

StructuredSerializeForStorage can be used for situations where the serialization is anticipated to be stored in a persistent manner, instead of passed between Realms. It throws when attempting to serialize SharedArrayBuffer objects, since storing shared memory doesn't make sense. Similarly, it throws when given any platform objects that are marked as [not serializable for storage]."

source Outdated
<span>StructuredCloneWithTransfer</span> on its arguments, but is careful to do so immediately,
inside the synchronous portion of its algorithm. Thus it is able to use the structured cloning
algorithms without needing to <span>prepare to run script</span> and <span>prepare to run a
<span>StructuredSerialize</span> on its arguments, but is careful to do so immediately, inside the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StructuredSerializeWithTransfer

annevk added a commit to whatwg/notifications that referenced this pull request Apr 24, 2017
As defined by and discussed in whatwg/html#2518.
Additionally, define StructuredSerializeForStorage when for
serializable objects need to be stored more persistently.

Tests: web-platform-tests/wpt#5003.

Fixes #2260 (by being the last in a set of fixes). Closes
tc39/proposal-ecmascript-sharedmem#144 and closes
tc39/proposal-ecmascript-sharedmem#65.
@annevk
Copy link
Member Author

annevk commented Apr 27, 2017

Filed https://bugzilla.mozilla.org/show_bug.cgi?id=1360190. Maybe we should have a generic bug for each browser though to go through the new tests. Safari fails quite a few due to not supporting shared or service workers, or BroadcastChannel, and also seems to have some bugs with window-to-window sharing.

@annevk
Copy link
Member Author

annevk commented Apr 27, 2017

@domenic
Copy link
Member

domenic commented Apr 27, 2017

This LGTM and I'm ready to merge/publish the blog post when you are. I agree with a single "update SAB to pass the tests" bug for each browser.

annevk pushed a commit to web-platform-tests/wpt that referenced this pull request Apr 28, 2017
@annevk annevk removed the do not merge yet Pull request must not be merged per rationale in comment label Apr 28, 2017
@annevk annevk merged commit b3f8779 into master Apr 28, 2017
@annevk annevk deleted the annevk/sab branch April 28, 2017 07:39
@annevk
Copy link
Member Author

annevk commented Apr 28, 2017

Something weird happened. Because I did squash & merge so I could change the commit message, GitHub ended up changing the author of the commit. I haven't seen that happen before.

annevk added a commit to whatwg/notifications that referenced this pull request May 14, 2017
As defined by and discussed in whatwg/html#2518.
andreubotella pushed a commit to andreubotella/html that referenced this pull request Sep 16, 2021
domenic pushed a commit that referenced this pull request Sep 17, 2021
mfreed7 pushed a commit to mfreed7/html that referenced this pull request Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants