Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order dependence for equality #15

Closed
bakkot opened this issue Jun 6, 2019 · 116 comments · Fixed by #98
Closed

Order dependence for equality #15

bakkot opened this issue Jun 6, 2019 · 116 comments · Fixed by #98

Comments

@bakkot
Copy link
Contributor

bakkot commented Jun 6, 2019

From the readme:

if they have the same values stored and inserted in the same order, they will be considered strictly equal

This is surprising to me. Contrast python dicts or Purescript records.

It's also just a weird path-dependence in values: I would not expect it to matter if I construct something by doing (const { a: 0 }) with .b = 1 or (const { b: 1 }) with .a = 0

@rricard
Copy link
Member

rricard commented Jun 6, 2019

I got some other negative feedback on the order. Basically it's there to be "mindful" of the engines so we have something that remotely looks like the hidden class implementation optimization. But I think that last consideration could be brought up later... I'm gonna make an another pass on the proposal text at some point soon I'll ping that issue when I do!

@tolmasky
Copy link

tolmasky commented Jun 6, 2019

Another interesting consideration is that I expect object with .key = object.key to be a no-op. However, if key ordering matters, then wouldn't object with .key push key's "order" to last, and thus (object with .key = object.key) !== object in many occasions?

@rricard
Copy link
Member

rricard commented Jun 6, 2019

If order matters then yes, and I agree it'll generate a lot of confusion. I was trying to be mindful of the engine before the user, it'll get updated!

@Jamesernator
Copy link

I think it would be better if instead for const objects enumeration order was in alphabetic order (or at least a consistent ordering). e.g. Object.keys(const { x: 10, y: 20 }) would be the same as Object.keys(const { y: 20, x: 10 }).

@ljharb
Copy link
Member

ljharb commented Jun 7, 2019

Sets, Maps, and objects (roughly) all follow insertion order for enumeration - that seems the precedent to follow.

Insertion order isn’t likely to matter when comparing two Sets in the Set method proposal - so I’d assume that’s the precedent to follow for comparison too.

@Jamesernator
Copy link

If we did have Object.keys returning different sequences for those two objects but === still holding do we think this is a bad thing?

There's a related (but inverse) example in the language already where NaN !== NaN despite the fact NaN will act identically in all cases.

For this case it'd be the opposite way, two objects could be === but not behave identically in all situations. In which case would we need Object.is(const { x: 10, y: 20 }, const { x: 20, y: 10 }) to return false?

@Jamesernator
Copy link

Jamesernator commented Jun 7, 2019

To clarify we currently have these rules that hold true:

  • === implies you can use either the LHS or RHS and the behaviour will be identical
    • But !== doesn't imply the reverse because of NaN
  • Object.is implies two values are the same in all circumstances including NaN and you can pick either and get the same behaviour in all cases

But making const { x: 10, y: 20 } === const { x: 20, y: 10 } would break the first implication. This might be okay but my feeling is we'd never want to break the Object.is implication as well though.

@dead-claudia
Copy link
Contributor

dead-claudia commented Jun 7, 2019

Order dependence can be removed if you're willing to allow some extra complexity that ranges from O(n) best case, O(n log n) to O(n^2) worst case (depending on implementation), and average case being somewhere in the middle. It's worth noting that Clojure, which uses immutable hash maps for nearly everything, also uses an order-independent equality algorithm for its maps and sets and does not have any significant performance issues from doing that. (I don't think it even offers a native order-dependent equality algorithm for its hash map data type.) They do not guarantee order, but I'm not sure how much of an effect that has on their algorithm.

@bakkot
Copy link
Contributor Author

bakkot commented Jun 7, 2019

@ljharb, you can't both have insertion order for iteration and make insertion order not matter for equality comparison. I'm with @Jamesernator - alphabetic order, possibly with nonnegative integer keys first, is the way to go.

(The set methods proposal isn't adding an "equals" method, so I'm not sure what you're referring to there, but in any case that would be a very different thing from two things being ===. The horrible edge case that is -0 aside, === implies that the left and right are the same value, which means they should not be distinguishable under any circumstances.)

@dead-claudia
Copy link
Contributor

dead-claudia commented Jun 7, 2019

@bakkot You can't avoid being able to distinguish them as long as you can iterate them - even if the iteration order is undefined, you'll still be able to tell many === records apart by iteration order. And === doesn't always imply "same value":

  • NaN !== NaN, one of my biggest criticisms of IEEE-754.
  • An embedder might choose to expose a pointer to a JS object handle to JS code as a number, and if this handle happens to be a string, two pointers to a === string might still be !==. (It's a similar story with bigints.)

As for the stated semantics, it's intended to express non-coercing equality, not necessarily complete indistinguishability. Checking that is the goal for Object.is, not ===.

Edit: Filed #20 to ask about this.

@bakkot
Copy link
Contributor Author

bakkot commented Jun 7, 2019

@isiahmeadows

You can't avoid being able to distinguish them as long as you can iterate them - even if the iteration order is undefined

Only if the iteration order is undefined, which it should not be. If it is defined purely in terms of which keys are present, then iterating would not allow you to distinguish them.

NaN !== NaN, one of my biggest criticisms of IEEE-754.

I'm familiar with equality semantics in JS, but "=== implies same value" does not mean "!== implies not same value".

An embedder might choose to expose a pointer to a JS object handle to JS code as a number

Technically this is not forbidden by the spec, but the spec should not be built on the assumption that engines are going to do so.

Checking that is the goal for Object.is, not ===.

The fact that === uses different semantics from Object.is is one of the worst parts of the language. We are not going to add any new values for which the two are not equivalent.

@rricard
Copy link
Member

rricard commented Jun 8, 2019

I think the way this is going to end up is the following:

  • Order will not matter for comparison
  • Enumeration is still in insert order

This might change or e reviewed later on as we get closer to actual implementation (if that proposal ever goes that far)

@tolmasky
Copy link

tolmasky commented Jun 8, 2019

I feel that many of the listed goals for value types will not actually be upheld if that is the case. A lot of the stated goals I feel fundamentally rely on the idea that:

Given a pure function f, and value types A and B, if A === B, then f(A) === f(B)

However, if A and B can ===, but return different results for Object.keys() (another way of saying that their iteration order is different), then no guarantees can bee made about f anymore. As such, many algorithms that want to rely on being able to "early return" when data is cheaply compared to be equal, will not actually be able to rely on === to do so. As a trivial example, simply imagine a React view that creates a list from the keys and values of an object. If this iteration is not guaranteed, then you can't simply say "I don't need to re-render if old-model === new-model".

@rricard
Copy link
Member

rricard commented Jun 8, 2019

That's a good point: I removed all ordering considerations for now from the proposal and we can keep discussing in that issue

@rricard
Copy link
Member

rricard commented Jun 11, 2019

We decided to have a stable key ordering: @rickbutton is working on it now!

@rickbutton
Copy link
Member

speaking of! #25

@littledan
Copy link
Member

littledan commented Jun 25, 2019

What is the sort order used for property keys? Cc @syg

@ljharb
Copy link
Member

ljharb commented Jun 25, 2019

I believe that should be outlined here

@littledan
Copy link
Member

@ljharb The explainer says that record fields are sorted, and it sounds like this is not done by the order they are present in the literal.

@syg
Copy link

syg commented Jun 25, 2019

I'm still worried that with a sorted order, === won't be easy to make cheap and fast with existing object representations in engines, and people in this space seem to really want cheap and fast deep equality between objects. The short-term worry is that the slowness of === hurts adoption.

I can envision how it can be made fast with different object representations, but the current hidden class / shape "linear history of property insertion" representation is pretty pervasive. Speculative optimizations are often checked for validity with a single shape pointer, for example.

It is attractive to have these const values things to like extensible record types from other languages, but that's a big departure from objects.

@tolmasky
Copy link

Without sort order we'll have a worse problem that prevents adoption: the comparison may be fast but it isn't useful (or at least, isn't any more useful than current objects). Per my example above, if you can't rely on f(a) === f(b) if a === b, then the utility of these objects decreases significantly. You can no longer rely on early return optimizations when a === b, so it becomes unclear what the benefit of these objects really is outside of "engine optimization" (as opposed to algorithm optimization that the users can employ). That is to say, maybe using immutable objects gets you "under the hood" speedups, but they can't really be safely reasoned about from the outside as being any safer than normal objects. In fact, a frozen mutable object is arguably "safer" in this regard than these immutable objects, because if frozen(a) === frozen(b), then (as far as I know) you can safely early exit, while you wouldn't with these immutable objects.

@syg
Copy link

syg commented Jun 25, 2019

I hear you. I also certainly don't advocate for a world where Object.keys order can differ for two === objects. Let's see if there's any engine pushback. If not, then great!

@bakkot
Copy link
Contributor Author

bakkot commented Jun 25, 2019

I'm hopeful that engines would be able to implement this with just a tweak to how hidden class transitions work: the new hidden class can remain a function of the current hidden class and the new key, it would just be a different function.

=== would still be slower than it is for objects, because you have to check which values the keys have in addition to which keys are present, but I don't think the fact that keys are in sorted order should impose any additional slowness to === as long as you can make the hidden class transitions take order into account.

But this is definitely a question for implementors.

@syg
Copy link

syg commented Jun 25, 2019

@bakkot Agreed that the value checking isn't likely to matter here.

It also might just be a tweak that works out cleanly, that'd be great. To be concrete, it'd be nice if an initial implementation in current engines didn't incur a bunch of perf cliffs. For example, I'm worried of JIT code being invalidated due to "insertions" when creating new immutable objects, because their hidden class has a different transition than just "append".

@tolmasky
Copy link

tolmasky commented Jun 25, 2019

My apologies and I do not mean to be argumentative, but I think our position vis a vis implementors should be that this is a deal breaker -- in other words, I think this feature isn't really worth it without sorted keys. My reasoning is that without sorted keys, the "best practices" around using immutable objects becomes incredibly sophisticated and subtle. In fact, I believe that "immutable objects" becomes one big expert-level gotcha. As a framework writer, I know that I have to essentially ignore their existence with user-supplied objects for the reasons listed above, but as a library user, even for something as simple as a memoizer, I have to hope that the library author is very aware of these issues. It's already the case that many memoizers fail with existing gotchas (for example, many erroneously believe they already have memoized results for "hasOwnProperty" or other keys that exist on Object's prototype), this would add yet another layer of submarine bugs.

The simplest proof-of-concept case of course is just memoize(Object.keys). Although this memoized function does not have a tremendous amount of utility, it does highlight the difficulty of implementing such a memoizer and the contradictory nature of the situation: if insertion order matters, then the memoizer should return the insertion-order listing of the keys -- but it will very likely fail to do so for indistinguishable "different" objects.

As I mentioned earlier, without sorted key I'd actually recommend frozen objects to users instead, where the expectations are more straight-forward -- it is not a "content addressable" object, it is merely a non-changeable object, where === works exactly the same as in all other contexts (that is to say, two "structurally identical", but different, objects are always !==). I don't mean this as a cop-out, but rather saying that we have an existing feature that is easier to understand (and more consistent in my opinion) for what this proposal would provide without sorted keys.

@benjamingr
Copy link

benjamingr commented Jul 22, 2020

then you would have Object.is(r1, r2) === true. And that implies that Object.getOwnPropertySymbols(r1) should be the same as Object.getOwnPropertySymbols(r2)

Why? Why would Object.is returning true imply getOwnPropertySymbols must return the same value order?

@nicolo-ribaudo
Copy link
Member

It's because Object.is(a, b) means "a and b are the same thing". If you pass two times the same thing to getOwnPropertySymbols, you shouldn't get two different results.

@benjamingr
Copy link

If you pass two times the same thing to getOwnPropertySymbols, you shouldn't get two different results.

Is that intuition or a spec requirement? My intuition is that getOwnPropertySymbols should return all the symbols the object/record has in some order, I would not expect to rely on the order of said symbols. If the spec requires getOwnPropertySymbols to return in some particular order for objects - I understand the objection.

@benjamingr
Copy link

Also, if it's a requirement - can't the spec well.. require it? Can't we say "Symbol keys on records are required to always be iterated in the same implementation defined order such that if Object.is(recordA, recordB) is true then Object.getOwnPropertySymbols(recordA) contains the same elements as Object.getOwnPropertySymbols(recordB) and in the same order" (except in spec language)?

@bakkot
Copy link
Contributor Author

bakkot commented Jul 22, 2020

Is that intuition or a spec requirement?

If some functions treat two objects differently, those objects are not the same, and so should not be Object.is equivalent. This is just an intuition, but a very strong one.

If the spec requires getOwnPropertySymbols to return in some particular order for objects - I understand the objection.

The spec requires getOwnPropertySymbols to return symbols in the order in which they were added to the object.

Can't we say

In general JS has been moving away from leaving things implementation-defined, since in practice people tend to end up relying on a particular implementation and then all engines are forced to go with that strategy forever. We'd want to specify a particular order. But which order? The only obvious one is creation order, which, as mentioned above, is a side channel.

@benjamingr
Copy link

benjamingr commented Jul 22, 2020

Thanks for bearing with me :] To be clear - I am not objecting one way or another I am just trying to understand. Intuitively it sounds like an unexpected limitation.

which, as mentioned above, is a side channel.

I'm not sure that I understand why that is a side channel or why it is any more of a side-channel than with objects?

The spec requires getOwnPropertySymbols to return symbols in the order in which they were added to the object.

For objects, records don't have to define the same order as objects I think?

This is just an intuition, but a very strong one.

It sounds reasonable when presented this way.

@bakkot
Copy link
Contributor Author

bakkot commented Jul 22, 2020

I'm not sure that I understand why that is a side channel or why it is any more of a side-channel than with objects?

My understanding is that it means that passing to another party the two symbols a and b, which are not otherwise distinguished, allows that party to determine in which order those symbols were created (by sticking them into a record and iterating its keys). This is a surprising piece of information about the computation which created a and b which you probably did not intend to reveal just by handing over those symbols.

But I'll let @erights speak to that further; I don't want to misrepresent his position.

For objects, records don't have to define the same order as objects I think?

Yeah, didn't mean to imply otherwise. In fact strings in records will probably be in lexicographical rather than (as for objects) chronological order. But we do need to pick some order.

@littledan
Copy link
Member

It's a bit of information, indeed. Personally, I'm not quite convinced that it's significant/important enough to constitute a meaningful information leak/security issue. I'm happy to ban Symbols as Record keys now, or even commit to that in the future, but I don't understand the scenario where this "leak" would lead to a bad outcome when composing programs.

@Jack-Works
Copy link
Member

Maybe, the following. But I doubt if it really harmful.

// code in realm A
Symbol.for(secret_password)
// code in realm B which is running bad code
function guess_password() {
    const symb = Symbol.for(Math.random().toString())
    for (const guess of possibilities) {
        for (const key in #{ [symb]: 0, [Symbol.for(guess)]: 0 }) {
            if (key !== symb) {
            	// which means the symbol @@"guess" is used
            	// before the symb created.
            	return key // same value of secret_password
            } else {
                // which means the symbol @@guess is created
                // after the symb created.
                // since the order of symbol requires a global
                // shared registry, it means this round of 
                // guess is wrong
            }
            continue
        }
    }
}

@benjamingr
Copy link

Maybe, the following. But I doubt if it really harmful.

I am not sure that's really a side-channel since the global symbol registry is well... global (it's whole goal is to leak this sort of info across realms isn't it?).

@Jack-Works
Copy link
Member

IIRC now you cannot know the order of symbols creations. If symbol can be generally sorted, this can leaks some information

@ljharb
Copy link
Member

ljharb commented Jul 24, 2020

Even revealing the relative order in which two symbols were created is a concern, because it would help an attacker fingerprint the code they’re attacking. Any unintentionally revealed information of any kind is a risk.

@benjamingr
Copy link

It sounds like:

  • There is an issue about leaking unintentional information in cross-realm symbols. It appears like participants are not sure that's a big concern, but it's a legitimate concern.
  • There is a solution by enforcing a consistent ordering on symbol keys (right?) - for example by symbol creation time. This is presumably pretty easy for engines to do. It's hard to spec since "object creation time" is a concept that has no representation in the spec at the moment.
  • There is another solution - making the spec require the invariant we care about due to the potential side-channel (symbol keys must always be iterated in the same order in records). However a concern here is that it would leave too much up to the implementer and engines could vary.

I honestly feel that language cohesiveness (objects and records allowing for different types of keys) was given up a bit too quickly. It's just a feeling and not criticism as I am pretty ignorant of the process that happened here and thankful for the discussion.

I think a reasonable path forward could be to write the potential side-channel attack clearly documenting what extra information is exposed across realms and why that is not desirable. Then maybe figure out if solutions #2 (spec and require object creation order or some other consistent order) or #3 (require the invariant without specifying the order) are viable.

@bakkot
Copy link
Contributor Author

bakkot commented Jul 24, 2020

in cross-realm symbols

The leak isn't just in cross-realm symbols, to be clear; it's just as much a problem within a single realm.

There is a solution by enforcing a consistent ordering on symbol keys (right?) - for example by symbol creation time

This isn't a solution to the side channel; it's the cause of it. The root problem is that, assuming keys for a given record are iterated in the same order each time, there must be an order across all symbols in order to preserve the fundamental property that if a and b are Object.is-equivalent then no function can treat a and b differently. The only obvious choice of order across all symbols is creation time, but that solution has a problem of its own, namely this side channel.

It would actually be straightforward to specify this order; we'd just say something along the lines of "in the order in which they were created", or have a global counter which the symbol constructor used to add an internal slot holding an ID into each symbol, or something. We already say "in ascending order of property creation time" for properties in OrdinaryOwnPropertyKeys.

There is another solution - making the spec require the invariant we care about due to the potential side-channel (symbol keys must always be iterated in the same order in records). However a concern here is that it would leave too much up to the implementer and engines could vary.

It's not just that engines could vary: I think that we should not require engines do a thing unless we have a good idea how they could do it in a way which would satisfy the constraints we care about. If we know how to specify an order over all symbols which wouldn't have this leak, we should just require that order explicitly; if we don't, I don't think we can reasonably require engines to come up with one on their own.

(Though I am also in favor of fully specifying things, as a rule.)

@tolmasky
Copy link

tolmasky commented Jul 24, 2020

For the sake of completeness since we are rehashing this, I believe there is another option, which is to only allow Symbols that have unique names. That is to say, similarly to how you can't have { "x": 1, "x": 2 } (the second overwrites the first), in records, if ToString(symbol1) === ToString(symbol2), then either symbol2 overwrites symbol1 or it throws an error. This would allow all the built-in symbols to always work (since IIRC they are the only ones where ToString resolves to something without quotes "Symbol(Symbol.iterator)" vs. "Symbol("my-symbol")"), and would only pop up if you use two unnamed symbols or two named symbols with the same name.

@ljharb
Copy link
Member

ljharb commented Jul 24, 2020

@tolmasky it's come up multiple times; anything that's not "allow no Symbols" or "allow all Symbols" will not be able to reach consensus. Also, String(Symbol('Symbol.iterator')) === String(Symbol.iterator).

@tolmasky
Copy link

tolmasky commented Jul 24, 2020

What if we say insertion order matters towards equality iff two symbols have the same ToString(). That is to say, the ordering of the symbols is lexicographical first, insertion order second (with the added convenience that "mutation" methods keep the original record's insertion order, so OwnPropertySymbols(#{ [SymbolWithToStringA]: 5, [OtherSymbolWithToStringA]: 5} with .[SymbolWithToStringA] = 8) === [SymbolWithToStringA, OtherSymbolWithToStringA]. This maintains if a === b, f(a) === f(b) by saying a !== b when a and b both have multiple symbols with the same name, but inserted in different orders, while still allowing the convenience of updating records without futzing with their property ordering, which would usually be the main reason to avoid this.

I'm not sure if we're aligned on goals, but I see the main goals as being:

  1. Maintain that if a === b, f(a) === f(b)
  2. Strive to have Object.is(a, b) as often as possible when Set(entries(a)) equals Set(entries(b))
  3. Allow records to participate in many of the features accessible through symbols, such as Symbol.iterator and Symbol.toStringTag (and the util.inspect.custom symbol in node for debugging purposes).
  4. Avoid leaking information about the larger system.

I feel like the lexicographical ordering, falling back to insertion order, and maintaining "originating" object insertion order seems to satisfy the best balance of the above 3 goals. It satisfies goal 1, 3, and 4(?) completely, at the expense of not having goal 2 be true as often as if we either disallow symbols completely, or allow them with a global counter to provide "identical ordering".

@ljharb
Copy link
Member

ljharb commented Jul 24, 2020

What would the third thing be? A sorting algorithm always has to have a tiebreaker, in this case when the Symbol descriptions are identical.

In particular, it's important that any records or tuples for which a === b also report indistinguishable results for both a and b from all the reflection methods.

If you maintain "original" insertion order, then you can potentially detect if a Symbol has been used in a Record or Tuple before, which is an information leak (violates goal 4, which is a nonstarter)

@tolmasky
Copy link

tolmasky commented Jul 24, 2020

Perhaps I didn't describe the algorithm clearly enough -- the insertion order is the tie breaker. You don't need a third tie breaker since they can't be inserted at the same time (unless I am missing something). So, the comparison is:

function compare(recordInQuestion, symbol1, symbol2)
{
    const comparison = ToString(symbol1).localeCompare(ToString(symbol2));

    if (comparison !== 0)
        return comparison;
   
    const insertionOrder1 = InsertionOrder(symbol1, recordInQuestion);
    const insertionOrder2 = InsertionOrder(symbol2, recordInQuestion);

    // Can't be 0.
    return insertionOrder1 < insertionOrder2 ? -1 : 1;
}

The change I am saying to make is that insertion order matters when determining whether a === b in the case of a and b both having symbols with identical names. In other words, I am maintaining if a === b then f(a) === f(b) by expanding the times where a !== b.

A few examples:

const SymbolA = Symbol();
const SymbolB = Symbol();

#{ [SymbolA]: 1, [SymbolB]: 2 } === #{ [SymbolA]: 1, [SymbolB]: 2 } // trivially equal (although name clashes, the insertion order is the same)
#{ [SymbolA]: 1, [Symbol.iterator]: f } === #{ [Symbol.iterator]: f, [SymbolA]: 1 } // equal since we never have to rely on order for non-name-clashing symbols
#{ [SymbolA]: 1, [SymbolB]: 2 , [Symbol.iterator]: f } === #{ [Symbol.iterator]: f, [SymbolA]: 1, [SymbolB]: 2  } // same as above
#{ [SymbolA]: 1, [SymbolB]: 2, [Symbol.iterator]: f } === #{ [SymbolA]: 1 , [Symbol.iterator]: f, [SymbolB]: 2 } // again, SymbolA and SymbolB in both instances are ordered the same

#{ [SymbolB]: 1, [SymbolA]: 2 } !== #{ [SymbolA]: 1, [SymbolB]: 2 } // not equal because we have to fall back to insertion order when sorting name-clash symbols
#{ [SymbolA]: 1, [SymbolB]: 2, [Symbol.iterator]: f } !== #{ [Symbol.iterator]: f, [SymbolB]: 2, [SymbolA]: 1  } // same as above

(#{ [SymbolA]: 1, [SymbolB]: 2 } with .[SymbolA] = 1) === #{ [SymbolA]: 1, [SymbolB]: 2 } // still equal because `with` maintains sort order of original record

So, if two objects have the same symbols, each with a unique name, then the ordering of insertion doesn't matter and the two objects are always equal. If however, the objects contain multiple symbols with the same name, then the insertion order of those symbols matters for equality.

@benjamingr
Copy link

If however, the objects contain multiple symbols with the same name, then the insertion order of those symbols matters for equality.

Hmm, so:

  • Symbols are allowed as keys.
  • Symbols are lexicographically sorted based on description, symbols without description and multiple symbols with the same description are sorted internally based on creation order.
  • No information is leaked about global symbols (Symbol.for) since those always have a description and a single "instance" for that description - making the side channel attack impossible.
  • For symbols not created with Symbol.for one would have to already hold a reference to the symbol in order to observe the difference in the order - mitigating the issue.

It would to me like (if I understand this correctly) @tolmasky 's suggestion addresses the symbol key limitation concerns no?

@tolmasky
Copy link

Just a slight correction here (although either could work), the tie breaker isn't creation order but rather insertion order (similar to how in objects the order is entirely based on insertion order). The reasoning for this is that it avoids needing a global counter at all, and in theory provides no leak whatsoever with regard to symbol creation times.

@littledan
Copy link
Member

For symbols not created with Symbol.for one would have to already hold a reference to the symbol in order to observe the difference in the order - mitigating the issue.

If the tiebreaker is insertion order, then we risk allowing tuples to be unequal when we'd hope that they would be equal. If the tiebreaker is creation order, then we risk leaking information through a membrane since membranes can't intercept primitives, and symbols are primitives. (I don't see ToString as really functioning as a helpful sort order, given that it leaves us with ties.)

@tolmasky
Copy link

tolmasky commented Jul 24, 2020

@littledan - yes this was the conceit I put upfront: if we are willing to allow the rare cases of records that:

  1. Share the same symbols (and corresponding value)
  2. But some of those symbols have the same name
  3. AND those symbols were inserted in a different order

to not be equal, then that resolves this issue. There is no "fundamental" issue or inconsistency here, just a performance degradation for certain edge cases, since it's up to us to define what contributes to equality. This also solves the issue of iterating the symbols in a consistent way even without the comparison issue.

The reasoning here is that it's very likely that records of the same "shape" are coming from the same factory function, so it would be a rare occurrence in an immutable record feature to be comparing records of two completely different creation sources that have almost identical shapes except for the fact that two functions that created them chose to order their identically named symbols differently.

And again, this failure state is not even necessarily that bad in my opinion. For example, in the case of memoization, a !== b can of course still result in f(a) === f(b), without caching. We're basically falling into the new String("a") !== new String("b") case with this, but again, triggered by a very unlikely set of events (which arguably people are already guarding against due to the old v8 advice of always putting your keys in the same order to trigger "performance").

Furthermore, I think this to a large degree is "essentially" correct in that it demonstrates the bounds of equality: Symbol() !== Symbol() due to nominal equality in javascript. In essence, we are trying to compute the hash(record), but the only tool we have to work with with identically named symbols is the === operator (and !== of course). Since we can't represent two identically named Symbols in a format that would distinguish them without creating further "hidden data", it stands to reason that we can no longer compare two objects that parent them -- except for the only tool we have which is direct comparison, which we can employ for determining insertion order.

@SCLeoX
Copy link

SCLeoX commented Jun 13, 2021

Sorry if this is already refuted, but an idea I have is that the spec mandates the engine should create an random number (in the same way as Math.random) for each new symbol. The symbol keys are then sorted according to this random number.

Also, I really don't think suggesting to the developers that two pieces of code running in the same VM have any sort of "secure isolation" in between is wise. After all, there are already tons of ways that "bad code" can do damage to/leaf information from the shared environment. (For example, by changing {}.__proto__, or by manually parsing the content of the entire <script> tag/source file.)

@dead-claudia
Copy link
Contributor

Sorry if this is already refuted, but an idea I have is that the spec mandates the engine should create an random number (in the same way as Math.random) for each new symbol. The symbol keys are then sorted according to this random number.

Not TC39, but the chances of you getting that through are extremely slim. Just FYI.

@SCLeoX
Copy link

SCLeoX commented Jun 14, 2021

Sorry if this is already refuted, but an idea I have is that the spec mandates the engine should create an random number (in the same way as Math.random) for each new symbol. The symbol keys are then sorted according to this random number.

Not TC39, but the chances of you getting that through are extremely slim. Just FYI.

Is there anything fundamentally flawed with this approach? I think it is terrible idea to disallow different features of a language from working together nicely and consistently, especially only for nitpicking reasons like preventing leaking information to other code running in the same VM by observing ordering of keys in a generally unordered data structure.

@ljharb
Copy link
Member

ljharb commented Jun 14, 2021

@SCLeoX objects and arrays, and records and tuples, are all ordered data structures. It’s important that the same code produce the same ordering in different environments.

@dead-claudia
Copy link
Contributor

dead-claudia commented Jun 14, 2021 via email

@SCLeoX
Copy link

SCLeoX commented Jun 14, 2021

@ljharb @isiahmeadows Sigh, I really don't think it is a strong reason. Truth be told, yes, determinism is nice to have, but it really isn't that important. After all, JavaScript currently is by no means deterministic (Math.random, new Date, WeakRef, etc.). Other than that, none of the non-academic production language I know is deterministic nor wants to be deterministic.

Deterministic execution, in my opinion, if really needed, should be a flag to the engine. When enabled, the engine will use the same seed for Math.random and symbol ordering generation (what I suggested) and disable Date related APIs.

(Sigh again) If deterministic execution is really what TC39 is after, I really hope they realize what they are trading with is literally the consistency and the symmetry of the language.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.