Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The symbolic value lattice used in the tier 2 optimizer needs documentation and tests. #115816

Open
markshannon opened this issue Feb 22, 2024 · 20 comments
Labels
docs Documentation in the Doc dir tests Tests in the Lib/test dir

Comments

@markshannon
Copy link
Member

markshannon commented Feb 22, 2024

The behavior of the symbolic values used in the tier 2 optimizer have a solid theoretical basis, but there is some subtlety involved.
The symbolic values and the lattice they form needs to be well documented and tested.

What are the symbolic values?

Each symbolic value represents not a real object, but knowledge about the set of objects that could be present at a given point in the code. This is powerful, but not intuitive.

Example:

def foo(x):               # line 0
    if x is None:         # line 1
       return "None"      # line 2
    return "ok"           # line 3

At line 3, the symbolic value for x is not None, whereas the symbolic value for x at line 0 is unknown.
However, any object referred to x is unchanged by foo. It is easy to think in terms of objects, but it is really a type system that we are representing.

Why we need a lattice

Example:

def foo(x):               # line 0
    if x is None:         # line 1
       return "None"      # line 2
    if x is not None:     # line 3
       return "not None"  # line 4
    return "???"          # line 5

and this trace:

 exit_if(x is not None)  # line 1
 exit_if(x is None)      # line 3
 return "???"            # line 5

The trace is a correct trace; it correctly represents one path through the function.
So what symbolic value does x have for line 5? It is both None and not None.
This is why the lattice needs a top value, even though it superficially makes no sense.
If a value is top for a location, then that location is unreachable.

Linked PRs

@markshannon markshannon added tests Tests in the Lib/test dir docs Documentation in the Doc dir labels Feb 22, 2024
@Fidget-Spinner
Copy link
Member

You might also want to add the docs to https://github.com/faster-cpython/ideas/blob/main/3.13/redundancy_eliminator.md.

@markshannon
Copy link
Member Author

markshannon commented Feb 23, 2024

EDITED -- Everything is true for top, which is weird, but top is weird.

[NOTE -- top and bottom should be swapped, see https://github.com/faster-cpython/ideas/pull/658]

Regarding tests, we need to test that:

  • All sym_is_ functions return false for bottom
  • All sym_is_ functions return true for top
  • All sym_get_ functions return NULL for bottom
  • All sym_get_ functions return NULL for top.. There is no correct value for top, so we merely need to test that it doesn't crash.
  • sym_is_type and sym_get_type functions behave the same for constants, as for their type.
  • sym_get_const return NULL for anything other than a simple constant (or top).
  • Applying sym_set_null to any type or const produces top
  • Applying sym_set_type to null produces top
  • consts produce the correct type from sys_get_type
  • Setting a contradictory type produces top
  • Setting the same type has no effect
  • Setting a const of a contradictory type produces top
  • Setting a const of a contradictory value produces top
  • Setting a const of the same type promotes a type to a const
  • Setting bottom to anything returns the type/value set to
  • Setting top to anything leaves it as top
  • Setting a type when it's already a const of that type has no effect
  • Any other combinations I haven't thought of...

@gvanrossum
Copy link
Member

Before we can even start writing tests we'll need to agree on the proper API for the symbols.

@gvanrossum
Copy link
Member

If it were up to me, the API should be focused on how we use it:

  • A symbol can be in one of the following states:
    • Nothing is known (bottom, blank initial slate)
      • It's NULL
      • It's not NULL
        • Not NULL and we know its type
          • Not NULL, known type, and known value (constant)
    • Impossible (top)
  • Initializing a symbol in one of these states requires a few different APIs (Create unknown, create NULL, create non-NULL, create specific type, create specific constant)
  • As we gain more info about a symbol, it moves to another state, according to the lattice Mark posted
    • For example, if we know it's True and we learn it's False, it transitions to top (impossible)
  • We need various inquiries:
    • Is it NULL?
    • Is it non-NULL?
    • Is it a specific type?
    • Is it a constant, and if so, what is its value?
    • Is it top?

For construction, I'd propose:

  • sym_new_unknown(ctx)
  • sym_new_null(ctx)
  • sym_new_notnull(ctx)
  • sym_new_type(ctx, type)
  • sym_new_const(ctx, value)

For updates, we'd get:

  • sym_set_null(sym)
  • sym_set_notnull(sym)
  • sym_set_type(sym, type)
  • sym_set_const(sym, value)

Note that if a symbols is known to be NULL and then we learn it is NULL, it must become top.

For inquiries:

  • sym_is_null(sym)
  • sym_is_notnull(sym)
  • sym_has_type(sym, type)
  • sym_get_const(sym) -> NULL or value

An orthogonal question is whether a symbol owns the references to the type and constant value.

  • It doesn't need to own the type, since we only use well-known immortal built-in types
  • I believe it should own the constant, since the results of constant folding may not be owned by anything else

As a result the constants must be DECREF'ed when the abstract interpretation context is destroyed.

If symbols own their constant, do sym_new_const() and sym_set_const() "steal" a reference from the caller, or does the caller (if it owns a reference) have to DECREF the constant too? We must take into account the error case too. While sym_set_const() cannot fail, sym_new_const() can fail (it can run out of space for new symbols). Possible approaches:

(A) sym_new_const() only steals a reference when it succeeds

// value has a new reference
sym = sym_new_const(ctx, value);
if (sym == NULL) {
    DECREF(value);
    goto error;
}

(B) sym_new_const() always steals the reference (even when it fails)

// value has a new reference
sym = sym_new_const(ctx, value);
if (sym == NULL) {
    goto error;
}

(C) sym_new_const() never steals the reference

// value has a new reference
sym = sym_new_const(ctx, value);
Py_DECREF(value);
if (sym == NULL) {
    goto error;
}

By comparison, for sym_set_const(), the call can never fail, so the equivalent cases are

(A), (B) sym_set_const() always steals the reference

// value has a new reference
sym_set_const(sym, value);

(C) sym_set_const() never steals the reference

// value has a new reference
sym_set_const(sym, value);
Py_DECREF(value);

Of these, (C) is perhaps the least surprising API design (it's how most C APIs work) but (B) requires writing slightly less code. (A) feels the most error-prone. I think it's important for sym_new_const() and sym_set_const() to be consistent, which also argues against (A). (It seems we currently don't need sym_set_const(), so maybe this argument is moot -- though ISTM that _GUARD_IS_TRUE_POP would need to call sym_set_const(sym, Py_True).

@gvanrossum
Copy link
Member

PS. Mark's lattice diagram has a node representing "not None". This might require a separate set of APIs (we currently apparently don't need to test for "not None" yet).

@gvanrossum
Copy link
Member

gvanrossum commented Feb 25, 2024

We should ensure that if we test for a property that's represented lower in the lattice (e.g. "not NULL") and the symbol has any value higher up in the lattice (e.g. bool or False), the tested property is still considered true (since bool and False imply "not NULL").

But what to do with top? Is top considered to have all properties, or none? It is highest in the diagram, so that would imply that it should have all properties. But isn't that confusing?

Also, according to Wikipedia, in type theory, Bottom (⊥) is the type that has no values, whereas Top (⊤) is the type that encompasses all values. Maybe we should switch terminology?

@gvanrossum
Copy link
Member

From Mark's list of tests I deduce that he wants the symbol representing the type without values to return false/NULL for all inquiries. IIRC I ran into this in the context of static type checking too. This feels pragmatic, as it is likely to cause the optimizer to make a conservative decision. It probably doesn't matter too much: if we ever encounter such a symbol we've reached an error condition or unreachable code, so the optimizer might as well insert _EXIT_TRACE and give up at this point.

@gvanrossum
Copy link
Member

So I've been thinking about how to export the symbols API in a way that we can easily write tests in Python.

My idea is to export _Py_UOpsAbstractInterpContext as a Python object (it already has a PyObject_HEAD, although it's curently not used) and make all the sym_* functions methods on that. I propose that the Python-level interface should represent symbols as their index in the t_arena.arena array -- the conversion back and forth is simple (check the range on int -> sym), and we don't have to make symbols themselves Python objects.

A test I don't see in Mark's list:

  • Setting a type when it's already a const of that type has no effect

@markshannon
Copy link
Member Author

If it were up to me, the API should be focused on how we use it

I agree. That should be true of all APIs.

But what to do with top? Is top considered to have all properties, or none? It is highest in the #114058 (comment), so that would imply that it should have all properties. But isn't that confusing?

Yes, it is confusing, which is why we need clear docs.
If top is present then the code must be unreachable, so any transformation we make is correct. Guaranteed bug free, but completely useless.
The optimizer could check for top and insert an unconditional exit However, if top is very rare, it is probably better to carry on regardless and occasionally emit unreachable code.

@markshannon
Copy link
Member Author

My idea is to export _Py_UOpsAbstractInterpContext as a Python object

We will want to reuse a memory buffer (probably per-thread) when optimizing, so the optimizer will need to initialize the context from a buffer. The test can take a bytesarray to use as a buffer, so we don't need to expose the context.

@markshannon
Copy link
Member Author

Regarding sym_new_const and sym_set_const they should both borrow references, but unlike most functions they should be allowed to store the borrowed reference.
The reason that this makes sense because the optimized trace will not be able to access any constants that are not guaranteed to outlive it.
If the constants must outlive the optimized code, they must outlive the optimizer pass.
This will need to be clearly documented.

@markshannon
Copy link
Member Author

markshannon commented Feb 26, 2024

Further thoughts on top, having added some basic tests:

It is useful to be able to query a symbol, extract a value and get a meaningful result.
In other words, if (sym_is_const(sym)) { PyObject *k = sym_get_const(sym) should put something meaningful in k.
But if sym_is_const(top) is true, that is impossible.
We probably need to choose names that exclude top for these cases.
E.g. sym_is_unique_type() and sym_is_unique_const().

@gvanrossum
Copy link
Member

gvanrossum commented Feb 26, 2024

@markshannon

  • Did you see my comment suggesting that we might have Top and Bottom mixed up?
  • I'm confused about the borrowing for constants. If we calculate the result of say 3.14 * 2.5 we have a new constant that is stored in the symbol. Who owns it?
  • I'm okay if sym_is_const() and sym_is_type() return false for both top and bottom.
  • I feel names like sym_is_unique_type() are unnecessary long.

@markshannon
Copy link
Member Author

Did you see my comment suggesting that we might have Top and Bottom mixed up?

Yes. Let's use "unknown" for "top" and "contradiction" for "bottom". It will hopefully save some confusion.

I'm confused about the borrowing for constants. If we calculate the result of say 3.14 * 2.5 we have a new constant that is stored in the symbol. Who owns it?

Whoever calculates it. Although they could transfer ownership to the executor if we want to add constants to executors.

I'm okay if sym_is_const() and sym_is_type() return false for both top and bottom.

It would be surprising if a value that was a constant is no longer a constant when we add more information. Which means that bottom contradiction is a constant (as well as being everything else).

I feel names like sym_is_unique_type() are unnecessary long.

Maybe, but it is shorter than sys_is_not_contradiction() && sym_is_type()

@gvanrossum
Copy link
Member

If we calculate the result of say 3.14 * 2.5 we have a new constant that is stored in the symbol. Who owns it?

Whoever calculates it. Although they could transfer ownership to the executor if we want to add constants to executors.

Ah, so the current code in _BINARY_OP_ADD_INT (etc.) would be incorrect (it currently works because symbols currently do keep a strong reference to their constant). It calculates a new value in temp and stores that in a new symbol, which is pushed onto the symbolic stack (as the output effect, res), but not otherwise made part of the executor -- in particular, it does not currently replace the uop with a _LOAD_CONST_INLINE uop. I'm guessing the proper fix is to replace the BINARY_OP_... uop with a _LOAD_CONST_INLINE (or _LOAD_CONST_INLINE_BORROW) uop.

(I'm okay with the rest of your response.)

@gvanrossum
Copy link
Member

Anyway, Mark designed it differently, writing the tests in C. That's fine too.

markshannon added a commit that referenced this issue Feb 27, 2024
…ability. (GH-115987)

* Rename _Py_UOpsAbstractInterpContext to _Py_UOpsContext and _Py_UOpsSymType to _Py_UopsSymbol.

* #define shortened form of _Py_uop_... names for improved readability.
@gvanrossum
Copy link
Member

Mark and I have discussed this some more. I am going to take over making all the tests pass.

Some things we discussed:

  • Bottom refers to the node in the type partial order (lattice) representing a contradiction
  • Bottom results: Whenever we update a symbol, there are three possibilities: no change, refinement, or contradiction
  • Bottom detection: We can auto-generate code when pushing results on the stack that checks for a contradiction
  • Symbol representation: we can probably use fewer bits (one for NULL, one for non-NULL, and nothing else -- if both are set it's Bottom)
  • Constant ownership:
    • Symbols own constants, the arena frees them at the end.
    • But we cannot use such constants in the executor, since there is currently no mechanism for an executor to own objects.
    • A possible approach would be to add an optional list of owned objects to the executor, to be DECREF'ed when the executor is freed.
    • An approach that avoids having a separate list would be to have a few special opcodes that own constants, but it's not clear how the JIT would deal with this.
    • Another approach is to have MAKE_INT and MAKE_FLOAT uops that construct new int/float objects from raw bits, but this is much more expensive than a LOAD_CONST[_INLINE], since it requires actually allocating an object.
    • Small ints (-5..255) are the only ones for which we can inline int ops using LOAD_CONST_INLINE_BORROW.
  • In the future, we should work on unboxed operations and deferred refcounts, but that's for 3.14.

@gvanrossum
Copy link
Member

Question: What's the better behavior for Bottom (Contradiction)? Should it have all properties (e.g. sym_is_null() and sym_is_not_null() both true) or none (both false)? Naive application of the rules would imply it has all properties (e.g. is null and not-null, matches any type, and is a constant), but that's not necessarily the most useful, and it breaks when we ask for its constant value. Right now I am coding things so that all properties are false for Bottom, it matches no types, and it has no constant.

@Fidget-Spinner
Copy link
Member

Right now I am coding things so that all properties are false for Bottom, it matches no types, and it has no constant.

That sounds right.

gvanrossum added a commit that referenced this issue Feb 28, 2024
- Any `sym_set_...` call that attempts to set conflicting information
  cause the symbol to become `bottom` (contradiction).
- All `sym_is...` and similar calls return false or NULL for `bottom`.
- Everything's tested.
- The tests still pass with `PYTHONUOPSOPTIMIZE=1`.
@gvanrossum
Copy link
Member

We have tests now, but I'm leaving this open because the API isn't yet documented.

gvanrossum added a commit that referenced this issue Feb 29, 2024
…fix (#116077)

This was left behind by GH-115987. Basically a lot of diffs like this:
```
-            res = _Py_uop_sym_new_unknown(ctx);
+            res = sym_new_unknown(ctx);
```
woodruffw pushed a commit to woodruffw-forks/cpython that referenced this issue Mar 4, 2024
woodruffw pushed a commit to woodruffw-forks/cpython that referenced this issue Mar 4, 2024
…intainability. (pythonGH-115987)

* Rename _Py_UOpsAbstractInterpContext to _Py_UOpsContext and _Py_UOpsSymType to _Py_UopsSymbol.

* #define shortened form of _Py_uop_... names for improved readability.
woodruffw pushed a commit to woodruffw-forks/cpython that referenced this issue Mar 4, 2024
…6028)

- Any `sym_set_...` call that attempts to set conflicting information
  cause the symbol to become `bottom` (contradiction).
- All `sym_is...` and similar calls return false or NULL for `bottom`.
- Everything's tested.
- The tests still pass with `PYTHONUOPSOPTIMIZE=1`.
woodruffw pushed a commit to woodruffw-forks/cpython that referenced this issue Mar 4, 2024
…op prefix (python#116077)

This was left behind by pythonGH-115987. Basically a lot of diffs like this:
```
-            res = _Py_uop_sym_new_unknown(ctx);
+            res = sym_new_unknown(ctx);
```
adorilson pushed a commit to adorilson/cpython that referenced this issue Mar 25, 2024
adorilson pushed a commit to adorilson/cpython that referenced this issue Mar 25, 2024
…intainability. (pythonGH-115987)

* Rename _Py_UOpsAbstractInterpContext to _Py_UOpsContext and _Py_UOpsSymType to _Py_UopsSymbol.

* #define shortened form of _Py_uop_... names for improved readability.
adorilson pushed a commit to adorilson/cpython that referenced this issue Mar 25, 2024
…6028)

- Any `sym_set_...` call that attempts to set conflicting information
  cause the symbol to become `bottom` (contradiction).
- All `sym_is...` and similar calls return false or NULL for `bottom`.
- Everything's tested.
- The tests still pass with `PYTHONUOPSOPTIMIZE=1`.
adorilson pushed a commit to adorilson/cpython that referenced this issue Mar 25, 2024
…op prefix (python#116077)

This was left behind by pythonGH-115987. Basically a lot of diffs like this:
```
-            res = _Py_uop_sym_new_unknown(ctx);
+            res = sym_new_unknown(ctx);
```
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
…intainability. (pythonGH-115987)

* Rename _Py_UOpsAbstractInterpContext to _Py_UOpsContext and _Py_UOpsSymType to _Py_UopsSymbol.

* #define shortened form of _Py_uop_... names for improved readability.
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
…6028)

- Any `sym_set_...` call that attempts to set conflicting information
  cause the symbol to become `bottom` (contradiction).
- All `sym_is...` and similar calls return false or NULL for `bottom`.
- Everything's tested.
- The tests still pass with `PYTHONUOPSOPTIMIZE=1`.
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
…op prefix (python#116077)

This was left behind by pythonGH-115987. Basically a lot of diffs like this:
```
-            res = _Py_uop_sym_new_unknown(ctx);
+            res = sym_new_unknown(ctx);
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir tests Tests in the Lib/test dir
Projects
None yet
Development

No branches or pull requests

3 participants