The symbolic value lattice used in the tier 2 optimizer needs documentation and tests. #115816

markshannon · 2024-02-22T10:41:05Z

The behavior of the symbolic values used in the tier 2 optimizer have a solid theoretical basis, but there is some subtlety involved.
The symbolic values and the lattice they form needs to be well documented and tested.

What are the symbolic values?

Each symbolic value represents not a real object, but knowledge about the set of objects that could be present at a given point in the code. This is powerful, but not intuitive.

Example:

def foo(x):               # line 0
    if x is None:         # line 1
       return "None"      # line 2
    return "ok"           # line 3

At line 3, the symbolic value for x is not None, whereas the symbolic value for x at line 0 is unknown.
However, any object referred to x is unchanged by foo. It is easy to think in terms of objects, but it is really a type system that we are representing.

Why we need a lattice

Example:

def foo(x):               # line 0
    if x is None:         # line 1
       return "None"      # line 2
    if x is not None:     # line 3
       return "not None"  # line 4
    return "???"          # line 5

and this trace:

 exit_if(x is not None)  # line 1
 exit_if(x is None)      # line 3
 return "???"            # line 5

The trace is a correct trace; it correctly represents one path through the function.
So what symbolic value does x have for line 5? It is both None and not None.
This is why the lattice needs a top value, even though it superficially makes no sense.
If a value is top for a location, then that location is unreachable.

Linked PRs

The text was updated successfully, but these errors were encountered:

Fidget-Spinner · 2024-02-22T11:30:33Z

You might also want to add the docs to https://github.com/faster-cpython/ideas/blob/main/3.13/redundancy_eliminator.md.

markshannon · 2024-02-23T17:49:58Z

EDITED -- Everything is true for top, which is weird, but top is weird.

[NOTE -- top and bottom should be swapped, see https://github.com/faster-cpython/ideas/pull/658]

Regarding tests, we need to test that:

All sym_is_ functions return false for bottom
All sym_is_ functions return true for top
All sym_get_ functions return NULL for bottom
~~All sym_get_ functions return NULL for top.~~. There is no correct value for top, so we merely need to test that it doesn't crash.
sym_is_type and sym_get_type functions behave the same for constants, as for their type.
sym_get_const return NULL for anything other than a simple constant (or top).
Applying sym_set_null to any type or const produces top
Applying sym_set_type to null produces top
consts produce the correct type from sys_get_type
Setting a contradictory type produces top
Setting the same type has no effect
Setting a const of a contradictory type produces top
Setting a const of a contradictory value produces top
Setting a const of the same type promotes a type to a const
Setting bottom to anything returns the type/value set to
Setting top to anything leaves it as top
Setting a type when it's already a const of that type has no effect
Any other combinations I haven't thought of...

gvanrossum · 2024-02-23T23:17:24Z

Before we can even start writing tests we'll need to agree on the proper API for the symbols.

gvanrossum · 2024-02-25T05:41:03Z

If it were up to me, the API should be focused on how we use it:

A symbol can be in one of the following states:
- Nothing is known (bottom, blank initial slate)
  - It's NULL
  - It's not NULL
    - Not NULL and we know its type
      - Not NULL, known type, and known value (constant)
- Impossible (top)
Initializing a symbol in one of these states requires a few different APIs (Create unknown, create NULL, create non-NULL, create specific type, create specific constant)
As we gain more info about a symbol, it moves to another state, according to the lattice Mark posted
- For example, if we know it's True and we learn it's False, it transitions to top (impossible)
We need various inquiries:
- Is it NULL?
- Is it non-NULL?
- Is it a specific type?
- Is it a constant, and if so, what is its value?
- Is it top?

For construction, I'd propose:

sym_new_unknown(ctx)
sym_new_null(ctx)
sym_new_notnull(ctx)
sym_new_type(ctx, type)
sym_new_const(ctx, value)

For updates, we'd get:

sym_set_null(sym)
sym_set_notnull(sym)
sym_set_type(sym, type)
sym_set_const(sym, value)

Note that if a symbols is known to be NULL and then we learn it is NULL, it must become top.

For inquiries:

sym_is_null(sym)
sym_is_notnull(sym)
sym_has_type(sym, type)
sym_get_const(sym) -> NULL or value

An orthogonal question is whether a symbol owns the references to the type and constant value.

It doesn't need to own the type, since we only use well-known immortal built-in types
I believe it should own the constant, since the results of constant folding may not be owned by anything else

As a result the constants must be DECREF'ed when the abstract interpretation context is destroyed.

If symbols own their constant, do sym_new_const() and sym_set_const() "steal" a reference from the caller, or does the caller (if it owns a reference) have to DECREF the constant too? We must take into account the error case too. While sym_set_const() cannot fail, sym_new_const() can fail (it can run out of space for new symbols). Possible approaches:

(A) sym_new_const() only steals a reference when it succeeds

// value has a new reference
sym = sym_new_const(ctx, value);
if (sym == NULL) {
    DECREF(value);
    goto error;
}

(B) sym_new_const() always steals the reference (even when it fails)

// value has a new reference
sym = sym_new_const(ctx, value);
if (sym == NULL) {
    goto error;
}

(C) sym_new_const() never steals the reference

// value has a new reference
sym = sym_new_const(ctx, value);
Py_DECREF(value);
if (sym == NULL) {
    goto error;
}

By comparison, for sym_set_const(), the call can never fail, so the equivalent cases are

(A), (B) sym_set_const() always steals the reference

// value has a new reference
sym_set_const(sym, value);

(C) sym_set_const() never steals the reference

// value has a new reference
sym_set_const(sym, value);
Py_DECREF(value);

Of these, (C) is perhaps the least surprising API design (it's how most C APIs work) but (B) requires writing slightly less code. (A) feels the most error-prone. I think it's important for sym_new_const() and sym_set_const() to be consistent, which also argues against (A). (It seems we currently don't need sym_set_const(), so maybe this argument is moot -- though ISTM that _GUARD_IS_TRUE_POP would need to call sym_set_const(sym, Py_True).

gvanrossum · 2024-02-25T05:43:41Z

PS. Mark's lattice diagram has a node representing "not None". This might require a separate set of APIs (we currently apparently don't need to test for "not None" yet).

gvanrossum · 2024-02-25T16:52:35Z

We should ensure that if we test for a property that's represented lower in the lattice (e.g. "not NULL") and the symbol has any value higher up in the lattice (e.g. bool or False), the tested property is still considered true (since bool and False imply "not NULL").

But what to do with top? Is top considered to have all properties, or none? It is highest in the diagram, so that would imply that it should have all properties. But isn't that confusing?

Also, according to Wikipedia, in type theory, Bottom (⊥) is the type that has no values, whereas Top (⊤) is the type that encompasses all values. Maybe we should switch terminology?

gvanrossum · 2024-02-25T17:37:05Z

From Mark's list of tests I deduce that he wants the symbol representing the type without values to return false/NULL for all inquiries. IIRC I ran into this in the context of static type checking too. This feels pragmatic, as it is likely to cause the optimizer to make a conservative decision. It probably doesn't matter too much: if we ever encounter such a symbol we've reached an error condition or unreachable code, so the optimizer might as well insert _EXIT_TRACE and give up at this point.

gvanrossum · 2024-02-26T06:16:45Z

So I've been thinking about how to export the symbols API in a way that we can easily write tests in Python.

My idea is to export _Py_UOpsAbstractInterpContext as a Python object (it already has a PyObject_HEAD, although it's curently not used) and make all the sym_* functions methods on that. I propose that the Python-level interface should represent symbols as their index in the t_arena.arena array -- the conversion back and forth is simple (check the range on int -> sym), and we don't have to make symbols themselves Python objects.

A test I don't see in Mark's list:

Setting a type when it's already a const of that type has no effect

markshannon · 2024-02-26T09:31:20Z

If it were up to me, the API should be focused on how we use it

I agree. That should be true of all APIs.

But what to do with top? Is top considered to have all properties, or none? It is highest in the #114058 (comment), so that would imply that it should have all properties. But isn't that confusing?

Yes, it is confusing, which is why we need clear docs.
If top is present then the code must be unreachable, so any transformation we make is correct. Guaranteed bug free, but completely useless.
The optimizer could check for top and insert an unconditional exit However, if top is very rare, it is probably better to carry on regardless and occasionally emit unreachable code.

markshannon · 2024-02-26T09:34:16Z

My idea is to export _Py_UOpsAbstractInterpContext as a Python object

We will want to reuse a memory buffer (probably per-thread) when optimizing, so the optimizer will need to initialize the context from a buffer. The test can take a bytesarray to use as a buffer, so we don't need to expose the context.

markshannon · 2024-02-26T09:39:00Z

Regarding sym_new_const and sym_set_const they should both borrow references, but unlike most functions they should be allowed to store the borrowed reference.
The reason that this makes sense because the optimized trace will not be able to access any constants that are not guaranteed to outlive it.
If the constants must outlive the optimized code, they must outlive the optimizer pass.
This will need to be clearly documented.

markshannon · 2024-02-26T14:14:08Z

Further thoughts on top, having added some basic tests:

It is useful to be able to query a symbol, extract a value and get a meaningful result.
In other words, if (sym_is_const(sym)) { PyObject *k = sym_get_const(sym) should put something meaningful in k.
But if sym_is_const(top) is true, that is impossible.
We probably need to choose names that exclude top for these cases.
E.g. sym_is_unique_type() and sym_is_unique_const().

gvanrossum · 2024-02-26T15:14:55Z

@markshannon

Did you see my comment suggesting that we might have Top and Bottom mixed up?
I'm confused about the borrowing for constants. If we calculate the result of say 3.14 * 2.5 we have a new constant that is stored in the symbol. Who owns it?
I'm okay if sym_is_const() and sym_is_type() return false for both top and bottom.
I feel names like sym_is_unique_type() are unnecessary long.

markshannon · 2024-02-26T18:19:06Z

Did you see my comment suggesting that we might have Top and Bottom mixed up?

Yes. Let's use "unknown" for "top" and "contradiction" for "bottom". It will hopefully save some confusion.

I'm confused about the borrowing for constants. If we calculate the result of say 3.14 * 2.5 we have a new constant that is stored in the symbol. Who owns it?

Whoever calculates it. Although they could transfer ownership to the executor if we want to add constants to executors.

I'm okay if sym_is_const() and sym_is_type() return false for both top and bottom.

It would be surprising if a value that was a constant is no longer a constant when we add more information. Which means that ~~bottom~~ contradiction is a constant (as well as being everything else).

I feel names like sym_is_unique_type() are unnecessary long.

Maybe, but it is shorter than sys_is_not_contradiction() && sym_is_type()

gvanrossum · 2024-02-26T19:08:31Z

If we calculate the result of say 3.14 * 2.5 we have a new constant that is stored in the symbol. Who owns it?

Whoever calculates it. Although they could transfer ownership to the executor if we want to add constants to executors.

Ah, so the current code in _BINARY_OP_ADD_INT (etc.) would be incorrect (it currently works because symbols currently do keep a strong reference to their constant). It calculates a new value in temp and stores that in a new symbol, which is pushed onto the symbolic stack (as the output effect, res), but not otherwise made part of the executor -- in particular, it does not currently replace the uop with a _LOAD_CONST_INLINE uop. I'm guessing the proper fix is to replace the BINARY_OP_... uop with a _LOAD_CONST_INLINE (or _LOAD_CONST_INLINE_BORROW) uop.

(I'm okay with the rest of your response.)

gvanrossum · 2024-02-26T22:01:24Z

Anyway, Mark designed it differently, writing the tests in C. That's fine too.

GH-115953)

…ability. (GH-115987) * Rename _Py_UOpsAbstractInterpContext to _Py_UOpsContext and _Py_UOpsSymType to _Py_UopsSymbol. * #define shortened form of _Py_uop_... names for improved readability.

gvanrossum · 2024-02-27T18:43:53Z

Mark and I have discussed this some more. I am going to take over making all the tests pass.

Some things we discussed:

Bottom refers to the node in the type partial order (lattice) representing a contradiction
Bottom results: Whenever we update a symbol, there are three possibilities: no change, refinement, or contradiction
Bottom detection: We can auto-generate code when pushing results on the stack that checks for a contradiction
Symbol representation: we can probably use fewer bits (one for NULL, one for non-NULL, and nothing else -- if both are set it's Bottom)
Constant ownership:
- Symbols own constants, the arena frees them at the end.
- But we cannot use such constants in the executor, since there is currently no mechanism for an executor to own objects.
- A possible approach would be to add an optional list of owned objects to the executor, to be DECREF'ed when the executor is freed.
- An approach that avoids having a separate list would be to have a few special opcodes that own constants, but it's not clear how the JIT would deal with this.
- Another approach is to have MAKE_INT and MAKE_FLOAT uops that construct new int/float objects from raw bits, but this is much more expensive than a LOAD_CONST[_INLINE], since it requires actually allocating an object.
- Small ints (-5..255) are the only ones for which we can inline int ops using LOAD_CONST_INLINE_BORROW.
In the future, we should work on unboxed operations and deferred refcounts, but that's for 3.14.

gvanrossum · 2024-02-27T23:52:26Z

Question: What's the better behavior for Bottom (Contradiction)? Should it have all properties (e.g. sym_is_null() and sym_is_not_null() both true) or none (both false)? Naive application of the rules would imply it has all properties (e.g. is null and not-null, matches any type, and is a constant), but that's not necessarily the most useful, and it breaks when we ask for its constant value. Right now I am coding things so that all properties are false for Bottom, it matches no types, and it has no constant.

Fidget-Spinner · 2024-02-28T01:30:08Z

Right now I am coding things so that all properties are false for Bottom, it matches no types, and it has no constant.

That sounds right.

- Any `sym_set_...` call that attempts to set conflicting information cause the symbol to become `bottom` (contradiction). - All `sym_is...` and similar calls return false or NULL for `bottom`. - Everything's tested. - The tests still pass with `PYTHONUOPSOPTIMIZE=1`.

gvanrossum · 2024-02-28T17:59:33Z

We have tests now, but I'm leaving this open because the API isn't yet documented.

…fix (#116077) This was left behind by GH-115987. Basically a lot of diffs like this: ``` - res = _Py_uop_sym_new_unknown(ctx); + res = sym_new_unknown(ctx); ```

… tests. (pythonGH-115953)

…intainability. (pythonGH-115987) * Rename _Py_UOpsAbstractInterpContext to _Py_UOpsContext and _Py_UOpsSymType to _Py_UopsSymbol. * #define shortened form of _Py_uop_... names for improved readability.

…6028) - Any `sym_set_...` call that attempts to set conflicting information cause the symbol to become `bottom` (contradiction). - All `sym_is...` and similar calls return false or NULL for `bottom`. - Everything's tested. - The tests still pass with `PYTHONUOPSOPTIMIZE=1`.

…op prefix (python#116077) This was left behind by pythonGH-115987. Basically a lot of diffs like this: ``` - res = _Py_uop_sym_new_unknown(ctx); + res = sym_new_unknown(ctx); ```

… tests. (pythonGH-115953)

…intainability. (pythonGH-115987) * Rename _Py_UOpsAbstractInterpContext to _Py_UOpsContext and _Py_UOpsSymType to _Py_UopsSymbol. * #define shortened form of _Py_uop_... names for improved readability.

…6028) - Any `sym_set_...` call that attempts to set conflicting information cause the symbol to become `bottom` (contradiction). - All `sym_is...` and similar calls return false or NULL for `bottom`. - Everything's tested. - The tests still pass with `PYTHONUOPSOPTIMIZE=1`.

…op prefix (python#116077) This was left behind by pythonGH-115987. Basically a lot of diffs like this: ``` - res = _Py_uop_sym_new_unknown(ctx); + res = sym_new_unknown(ctx); ```

… tests. (pythonGH-115953)

…intainability. (pythonGH-115987) * Rename _Py_UOpsAbstractInterpContext to _Py_UOpsContext and _Py_UOpsSymType to _Py_UopsSymbol. * #define shortened form of _Py_uop_... names for improved readability.

…6028) - Any `sym_set_...` call that attempts to set conflicting information cause the symbol to become `bottom` (contradiction). - All `sym_is...` and similar calls return false or NULL for `bottom`. - Everything's tested. - The tests still pass with `PYTHONUOPSOPTIMIZE=1`.

…op prefix (python#116077) This was left behind by pythonGH-115987. Basically a lot of diffs like this: ``` - res = _Py_uop_sym_new_unknown(ctx); + res = sym_new_unknown(ctx); ```

markshannon added tests Tests in the Lib/test dir docs Documentation in the Doc dir labels Feb 22, 2024

markshannon mentioned this issue Feb 22, 2024

gh-115685: Type/values propagate for TO_BOOL in tier 2 #115686

Merged

bedevere-app bot mentioned this issue Feb 26, 2024

GH-115816: Make tier2 optimizer symbols testable, and add a few tests. #115953

Merged

markshannon added a commit that referenced this issue Feb 27, 2024

GH-115816: Make tier2 optimizer symbols testable, and add a few tests. (

10fbcd6

GH-115953)

bedevere-app bot mentioned this issue Feb 27, 2024

GH-115816: Assorted naming and formatting changes to improve maintainability. #115987

Merged

This was referenced Feb 28, 2024

gh-115816: Improve internal symbols API in optimizer #116011

Closed

gh-115816: Improve internal symbols API in optimizer #116028

Merged

bedevere-app bot mentioned this issue Feb 28, 2024

gh-115816: Generate calls to sym_new_const() etc. without _Py_uop prefix #116077

Merged

woodruffw pushed a commit to woodruffw-forks/cpython that referenced this issue Mar 4, 2024

pythonGH-115816: Make tier2 optimizer symbols testable, and add a few…

a67fdc7

… tests. (pythonGH-115953)

adorilson pushed a commit to adorilson/cpython that referenced this issue Mar 25, 2024

pythonGH-115816: Make tier2 optimizer symbols testable, and add a few…

f84b6a0

… tests. (pythonGH-115953)

diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024

pythonGH-115816: Make tier2 optimizer symbols testable, and add a few…

164f567

… tests. (pythonGH-115953)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The symbolic value lattice used in the tier 2 optimizer needs documentation and tests. #115816

The symbolic value lattice used in the tier 2 optimizer needs documentation and tests. #115816

markshannon commented Feb 22, 2024 •

edited by bedevere-app bot

Loading

Fidget-Spinner commented Feb 22, 2024

markshannon commented Feb 23, 2024 •

edited by gvanrossum

Loading

gvanrossum commented Feb 23, 2024

gvanrossum commented Feb 25, 2024

gvanrossum commented Feb 25, 2024

gvanrossum commented Feb 25, 2024 •

edited

Loading

gvanrossum commented Feb 25, 2024

gvanrossum commented Feb 26, 2024

markshannon commented Feb 26, 2024

markshannon commented Feb 26, 2024

markshannon commented Feb 26, 2024

markshannon commented Feb 26, 2024 •

edited by gvanrossum

Loading

gvanrossum commented Feb 26, 2024 •

edited

Loading

markshannon commented Feb 26, 2024

gvanrossum commented Feb 26, 2024

gvanrossum commented Feb 26, 2024

gvanrossum commented Feb 27, 2024

gvanrossum commented Feb 27, 2024

Fidget-Spinner commented Feb 28, 2024

gvanrossum commented Feb 28, 2024

The symbolic value lattice used in the tier 2 optimizer needs documentation and tests. #115816

The symbolic value lattice used in the tier 2 optimizer needs documentation and tests. #115816

Comments

markshannon commented Feb 22, 2024 • edited by bedevere-app bot Loading

What are the symbolic values?

Why we need a lattice

Linked PRs

Fidget-Spinner commented Feb 22, 2024

markshannon commented Feb 23, 2024 • edited by gvanrossum Loading

gvanrossum commented Feb 23, 2024

gvanrossum commented Feb 25, 2024

gvanrossum commented Feb 25, 2024

gvanrossum commented Feb 25, 2024 • edited Loading

gvanrossum commented Feb 25, 2024

gvanrossum commented Feb 26, 2024

markshannon commented Feb 26, 2024

markshannon commented Feb 26, 2024

markshannon commented Feb 26, 2024

markshannon commented Feb 26, 2024 • edited by gvanrossum Loading

gvanrossum commented Feb 26, 2024 • edited Loading

markshannon commented Feb 26, 2024

gvanrossum commented Feb 26, 2024

gvanrossum commented Feb 26, 2024

gvanrossum commented Feb 27, 2024

gvanrossum commented Feb 27, 2024

Fidget-Spinner commented Feb 28, 2024

gvanrossum commented Feb 28, 2024

markshannon commented Feb 22, 2024 •

edited by bedevere-app bot

Loading

markshannon commented Feb 23, 2024 •

edited by gvanrossum

Loading

gvanrossum commented Feb 25, 2024 •

edited

Loading

markshannon commented Feb 26, 2024 •

edited by gvanrossum

Loading

gvanrossum commented Feb 26, 2024 •

edited

Loading