Onnxruntime-directml 1.18.0 broken multithreading inference session #20713

Djdefrag · 2024-05-17T19:09:07Z

Describe the issue

With the new version 1.18 it seems that trying to use different InferenceSession using the same DirectML device, all threads remain stalled without giving any exception or error

To reproduce

Thread 1

 AI_model_loaded = onnx_load(AI_model_path)

AI_model = onnxruntime_inferenceSession(
    path_or_bytes = AI_model_loaded.SerializeToString(), 
    providers =  [('DmlExecutionProvider', {"device_id": "0"})]
)    

onnx_input  = {AI_model.get_inputs()[0].name: image}
onnx_output = AI_model.run(None, onnx_input)[0]

Thread n (where n can be any number)

AI_model_loaded = onnx_load(AI_model_path)

AI_model = onnxruntime_inferenceSession(
    path_or_bytes = AI_model_loaded.SerializeToString(), 
    providers =  [('DmlExecutionProvider', {"device_id": "0"})]
)    

onnx_input  = {AI_model.get_inputs()[0].name: image}
onnx_output = AI_model.run(None, onnx_input)[0]

Urgency

No response

Platform

Windows

OS Version

10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

DirectML

Execution Provider Library Version

1.18.0

The text was updated successfully, but these errors were encountered:

sophies927 · 2024-05-19T19:53:13Z

Tagging @PatriceVignola @smk2007 @fdwr for visibility.

saulthu · 2024-06-03T00:25:01Z

Same here on Windows, versions 1.16.0 to 1.17.3 work fine over multiple threads, however 1.18.0 gives Windows fatal exception: access violation with the following stack trace produced by my own Windows SEH handler:

-----------
Caught unhandled exception...
-----------

Terminating from thread id 10152

Non-C++ exception:
  Error: EXCEPTION_ACCESS_VIOLATION
  Type: Read
  Addr: 0x0

Trace:
 40:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 39:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 38:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 37:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 36:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 35:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 34:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 33:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 32:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 31:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 30:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 29:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 28:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 27:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 26:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 25:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 24:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 23:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 22:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 21:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 20:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 19:  ?: PyInit_onnxruntime_pybind11_state  (onnxruntime_pybind11_state.pyd)
 18:  ?: pybind11::error_already_set::discard_as_unraisable  (onnxruntime_pybind11_state.pyd)
 17:  ?: PyObject_MakeTpCall  (python311.dll)
 16:  ?: PyObject_Vectorcall  (python311.dll)
 15:  ?: PyEval_EvalFrameDefault  (python311.dll)
 14:  ?: PyFunction_Vectorcall  (python311.dll)
 13:  ?: PyFunction_Vectorcall  (python311.dll)
 12:  ?: PyObject_CallObject  (python311.dll)
 11:  ?: PyEval_EvalFrameDefault  (python311.dll)
 10:  ?: PyFunction_Vectorcall  (python311.dll)
  9:  ?: PyObject_CallObject  (python311.dll)
  8:  ?: PyEval_EvalFrameDefault  (python311.dll)
  7:  ?: PyFunction_Vectorcall  (python311.dll)
  6:  ?: PyFunction_Vectorcall  (python311.dll)
  5:  ?: PyObject_Call  (python311.dll)
  4:  ?: PyInterpreterState_Delete  (python311.dll)
  3:  ?: PyInterpreterState_Delete  (python311.dll)
  2:  ?: recalloc  (ucrtbase.dll)
  1:  ?: BaseThreadInitThunk  (KERNEL32.DLL)
  0:  ?: RtlUserThreadStart  (ntdll.dll)

liuyunms · 2024-06-07T00:18:01Z

We’ve noted the issue with GPU resource contention due to multiple threads. This usage pattern is not recommended as it makes multiple threads request all of the GPU resources, and can cause contention. Also, the allocator in python API (both CUDA and DML) is explicitly not thread safe because it initializes the allocator as a global singleton due it living outside of the session.

We’re investigating the recent failure and will address it. Meanwhile, please avoid this pattern to prevent GPU contention.

Djdefrag · 2024-06-20T10:09:45Z

Hi @liuyunms

Sorry to bother, I'm currently using an InferenceSession per tread, but you say it shouldn't be used this way.

4 threds -> 4 inference session with same gpu

Do you mean to use the same InferenceSession in multiple threads? Is it possible?

4 threds -> 1 inference session with same gpu

Djdefrag · 2024-06-28T06:22:33Z

@PatriceVignola @smk2007 @fdwr

Hi, sorry to bother, there are some news for this problem? Actually testing 1.18.1 and the problem is still present :(

Thank you

zhangxiang1993 · 2024-08-15T02:27:51Z

#21566
This PR might be related. @Djdefrag Could you help verify if the problem is fixed?

saulthu · 2024-08-15T03:46:15Z

@zhangxiang1993 It still crashes using multiple threads in my application. I just tried the nightly build of 1.19 from here (I used the Python 3.11 build for Windows). I've reverted back to 1.17.3 which still works.

Djdefrag · 2024-08-15T05:57:03Z

@zhangxiang1993

Hi, I can confirm that the problem is also present on 1.19 nightly (python 3.11)

ORT 1.19 [NOT working]
ORT 1.18.1 [NOT working]
ORT 1.18 [NOT working]
ORT 1.17.3 [working]

henryruhs · 2024-08-15T21:14:00Z

Not sure if this helps, but I have this method to work around it.

import threading
from contextlib import nullcontext
from typing import ContextManager, Union

THREAD_SEMAPHORE : threading.Semaphore = threading.Semaphore()
NULL_CONTEXT : ContextManager[None] = nullcontext()

def conditional_thread_semaphore() -> Union[threading.Semaphore, ContextManager[None]]:
	if has_execution_provider('directml') or has_execution_provider('rocm'):
		return THREAD_SEMAPHORE
	return NULL_CONTEXT

with conditional_thread_semaphore():
	onnxruntime.run()

Sorry, but implement has_execution_provider yourself :)

saulthu · 2024-08-15T22:53:04Z

@henryruhs

with conditional_thread_semaphore():
	onnxruntime.run()

This works, however it defeats the purpose of running in multiple threads. In older versions that do not crash I can have the GPU running at 100%, but this workaround causes a very large performance hit.

henryruhs · 2024-08-16T05:27:32Z

yeah, the performance hit is something I am aware of.

Djdefrag · 2024-08-16T08:51:14Z

Hi @henryruhs

Thank you. I tried the solution using Semaphore and it works, but the performance is in line with using only one Thread.

Hopefully they will fix the problem with the next release.

linyu0219 · 2024-08-31T08:44:46Z

This problem is significant, so most of us will remain on version 1.17.3. Please fix it.

github-actions bot added ep:DML issues related to the DirectML execution provider platform:windows issues related to the Windows platform labels May 17, 2024

Djdefrag mentioned this issue May 19, 2024

[DO NOT UNPIN] ORT 1.18.0 Release Candidates available for testing #20558

Closed

Djdefrag mentioned this issue Aug 9, 2024

[DO NOT UNPIN] ORT 1.19 release candidates available for testing #21678

Closed

Djdefrag mentioned this issue Sep 29, 2024

[Feature Request] Make DirectML execution provider thread safe (allow Run() concurrency) #22147

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Onnxruntime-directml 1.18.0 broken multithreading inference session #20713

Onnxruntime-directml 1.18.0 broken multithreading inference session #20713

Djdefrag commented May 17, 2024 •

edited

Loading

sophies927 commented May 19, 2024

saulthu commented Jun 3, 2024 •

edited

Loading

liuyunms commented Jun 7, 2024

Djdefrag commented Jun 20, 2024 •

edited

Loading

Djdefrag commented Jun 28, 2024

zhangxiang1993 commented Aug 15, 2024

saulthu commented Aug 15, 2024 •

edited

Loading

Djdefrag commented Aug 15, 2024

henryruhs commented Aug 15, 2024 •

edited

Loading

saulthu commented Aug 15, 2024 •

edited

Loading

henryruhs commented Aug 16, 2024

Djdefrag commented Aug 16, 2024

linyu0219 commented Aug 31, 2024

Onnxruntime-directml 1.18.0 broken multithreading inference session #20713

Onnxruntime-directml 1.18.0 broken multithreading inference session #20713

Comments

Djdefrag commented May 17, 2024 • edited Loading

Describe the issue

To reproduce

Thread 1

Thread n (where n can be any number)

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

sophies927 commented May 19, 2024

saulthu commented Jun 3, 2024 • edited Loading

liuyunms commented Jun 7, 2024

Djdefrag commented Jun 20, 2024 • edited Loading

Djdefrag commented Jun 28, 2024

zhangxiang1993 commented Aug 15, 2024

saulthu commented Aug 15, 2024 • edited Loading

Djdefrag commented Aug 15, 2024

henryruhs commented Aug 15, 2024 • edited Loading

saulthu commented Aug 15, 2024 • edited Loading

henryruhs commented Aug 16, 2024

Djdefrag commented Aug 16, 2024

linyu0219 commented Aug 31, 2024

Djdefrag commented May 17, 2024 •

edited

Loading

saulthu commented Jun 3, 2024 •

edited

Loading

Djdefrag commented Jun 20, 2024 •

edited

Loading

saulthu commented Aug 15, 2024 •

edited

Loading

henryruhs commented Aug 15, 2024 •

edited

Loading

saulthu commented Aug 15, 2024 •

edited

Loading