Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If threading is not imported from the main thread it sees the wrong thread as the main thread. #81597

Closed
fabioz mannequin opened this issue Jun 26, 2019 · 9 comments
Closed
Labels
3.8 only security fixes 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@fabioz
Copy link
Mannequin

fabioz mannequin commented Jun 26, 2019

BPO 37416
Nosy @pitrou, @vstinner, @fabioz, @ericsnowcurrently, @aldwinaldwin, @int19h
Superseder
  • bpo-31517: MainThread association logic is fragile
  • Files
  • snippet.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-07-03.21:41:08.763>
    created_at = <Date 2019-06-26.18:04:42.605>
    labels = ['interpreter-core', 'type-bug', '3.8', '3.9']
    title = 'If threading is not imported from the main thread it sees the wrong thread as the main thread.'
    updated_at = <Date 2019-07-03.21:41:08.756>
    user = 'https://github.com/fabioz'

    bugs.python.org fields:

    activity = <Date 2019-07-03.21:41:08.756>
    actor = 'pitrou'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-07-03.21:41:08.763>
    closer = 'pitrou'
    components = ['Interpreter Core']
    creation = <Date 2019-06-26.18:04:42.605>
    creator = 'fabioz'
    dependencies = []
    files = ['48438']
    hgrepos = []
    issue_num = 37416
    keywords = []
    message_count = 9.0
    messages = ['346653', '346720', '346723', '346724', '346726', '346746', '346747', '346750', '347241']
    nosy_count = 6.0
    nosy_names = ['pitrou', 'vstinner', 'fabioz', 'eric.snow', 'aldwinaldwin', 'int19h']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = 'resolved'
    status = 'closed'
    superseder = '31517'
    type = 'behavior'
    url = 'https://bugs.python.org/issue37416'
    versions = ['Python 3.8', 'Python 3.9']

    @fabioz
    Copy link
    Mannequin Author

    fabioz mannequin commented Jun 26, 2019

    I'm attaching a snippet which shows the issue (i.e.: threading.main_thread() and threading.current_thread() should be the same and they aren't).

    What I'd see as a possible solution is that the initial thread ident would be stored when the interpreter is initialized and then when threading is imported the first time it would get that indent to initialize the main thread instead of calling threading._MainThread._set_ident in the wrong thread.

    I'm not sure if this is possible if CPython is embedded in some other C++ program, but it seems to be the correct approach when Python is called from the command line.

    As a note, I found this when doing an attach to pid for the pydevd debugger where a thread is created to initialize the debugger (the issue on the debugger is reported at: microsoft/ptvsd#1542).

    @fabioz fabioz mannequin added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Jun 26, 2019
    @aldwinaldwin
    Copy link
    Mannequin

    aldwinaldwin mannequin commented Jun 27, 2019

    The second import actually doesn't happen. You need to reload it. This can be tested by putting print('loading threading') in threading.py.

    Your first method-thread will still think it's main thread. So no idea if this is a bug or wrong use. 'import threading' should be one of the first lines in your main code/thread?

    import _thread
    import time
    import importlib
    barrier = 0
    
    def method():
        import threading  # Will make threading the wrong thread.
        global barrier
        print(threading.main_thread())
        print(threading.current_thread())
        barrier = 1
    
    _thread.start_new_thread(method, ())
    
    while barrier != 1:
        time.sleep(.1)
    
    import threading
    importlib.reload(threading)
    print(threading.main_thread())
    print(threading.current_thread())

    @int19h
    Copy link
    Mannequin

    int19h mannequin commented Jun 27, 2019

    This is a bit tricky to explain... There's no easy way to achieve this effect "normally". It manifests due to the way some Python debuggers (specifically, pydevd and ptvsd - as used by PyCharm, PyDev, and VSCode) implement non-cooperative attaching to a running Python process by PID.

    A TL;DR take is that those debuggers have to inject a new thread into a running process from the outside, and then run some Python code on that thread. There are OS APIs for such thread injection - e.g. CreateRemoteThread on Windows. There are various tricks that they then have to use to safely acquire GIL and invoke PyEval_InitThreads, but ultimately it comes down to running Python code.

    That is the point where this can manifest. Basically, as soon as that injected code (i.e. the actual debugger) imports threading, things break. And advanced debuggers do need background threads for some functionality...

    Here are the technical details - i.e. how thread injection happens exactly, and what kind of code it might run - if you're interested.
    microsoft/ptvsd#1542

    I think that a similar problem can also occur in an embedded Python scenario with multithreading. Consider what happens if the hosted interpreter is initialized from the main thread of the host app - but some Python code is then run from the background thread, and that code happens to be the first in the process to import threading. Then that background thread becomes the "main thread" for threading, with the same results as described above.

    The high-level problem, I think, is that there's an inconsistency between what Python itself considers "main thread" (i.e. main_thread in ceval.c, as set by PyEval_InitThreads), and what threading module considers "main thread" (i.e. _main_thread in threading.py). Logically, these should be in sync.

    If PyEval_InitThreads is the source of truth, then the existing thread injection technique will "just work" as implemented already in all the aforementioned debuggers. It makes sure to invoke PyEval_InitThreads via Py_AddPendingCall, rather than directly from the background thread, precisely so that the interpreter doesn't get confused.

    Furthermore, on 3.7+, PyEval_InitThreads is already automatically invoked by Py_Initialize, and hence when used by python.exe, will mark the actual first thread in the process as the main thread. So, using it a the source of truth would guarantee that attach by thread injection works correctly in all non-embedded Python scenarios.

    Apps hosting Python would still need to ensure that they always call Py_Initialize on what they want to be the main thread, as they already have to do; but they wouldn't need to worry about "import threading" anymore.

    @int19h
    Copy link
    Mannequin

    int19h mannequin commented Jun 27, 2019

    It's also possible to hit if using some native code that starts a background thread without going via threading, and runs Python code on that background thread. In that case, if that Python code then does "import threading", and threading hasn't been imported yet, then you have this same problem.

    Here's a pure Python repro using ctypes on Win32:

    #--------------------------

    import sys, time
    from ctypes import *
    
    ThreadProc = WINFUNCTYPE(c_uint32, c_void_p)
    
    @ThreadProc
    def thread_proc(_):
        import threading
        print(threading.current_thread() is threading.main_thread())
        return 0

    assert "threading" not in sys.modules
    #import threading # uncomment to fix

    windll.kernel32.CreateThread(None, c_size_t(0), thread_proc, None, c_uint32(0), None)
    time.sleep(1)

    assert "threading" in sys.modules
    import threading
    print(threading.current_thread() is threading.main_thread())
    #--------------------------

    Here's the output:

    >py -3 main.py
    True
    False
    Exception ignored in: <module 'threading' from 'C:\\Python\\3.7-64\\lib\\threading.py'>
    Traceback (most recent call last):
      File "C:\Python\3.7-64\lib\threading.py", line 1276, in _shutdown
        assert tlock.locked()
    AssertionError:

    @aldwinaldwin
    Copy link
    Mannequin

    aldwinaldwin mannequin commented Jun 27, 2019

    Understood. Thank you for the extra info. I'll read up on all the suggestions.

    @ericsnowcurrently
    Copy link
    Member

    Note that in Python 3.7 we consolidated a bunch of the global runtime state. The "main_thread" static in ceval.c moved to the internal struct _PyRuntimestate and in 3.7 it is accessible as _Runtime.ceval.pending.main_thread (but only if you build with Py_BUILD_CORE, which extensions shouldn't). In 3.8 it was moved to the top of the struct as _PyRuntimeState.main_thread. All that said, that thread ID is still not exposed directly in Python.

    So perhaps it would make sense to add something like sys.get_main_thread_id, which Lib/threading.py could then use (rather than assuming the current (during import) thread is the main thread).

    Unfortunately, this wouldn't help with anything earlier than 3.9. Currently 3.8 is already in beta1, where we apply a feature freeze, so it's barely too late for 3.8. I suppose you could ask for a special exception from the release manager, given the change would be relatively small, with real benefits, but don't expect it. :)

    @ericsnowcurrently ericsnowcurrently added 3.8 only security fixes 3.9 only security fixes and removed 3.7 (EOL) end of life labels Jun 27, 2019
    @ericsnowcurrently
    Copy link
    Member

    FWIW, this might also have implications for thread shutdown during runtime/interpreter finalization.

    @int19h
    Copy link
    Mannequin

    int19h mannequin commented Jun 27, 2019

    Debuggers will have to work around this in past Python versions that they support (which still includes Python 2 for pretty much all of them), so this is solely about resolving the inconsistency for the future. No point rushing it for 3.8 specifically.

    (The most likely immediate workaround will be that, instead of invoking PyEval_InitThreads on the main thread via Py_AddPendingCall, we will simply use the same facility to exec "import threading" on the main thread.)

    @pitrou
    Copy link
    Member

    pitrou commented Jul 3, 2019

    This is a duplicate of bpo-31517.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants