[TensorRT EP] OOM (RAM) when loading ONNX model #21219
Labels
ep:CUDA
issues related to the CUDA execution provider
ep:OpenVINO
issues related to OpenVINO execution provider
ep:TensorRT
issues related to TensorRT execution provider
Describe the issue
OOM (RAM) when loading the model - 50GiB is not enough.
To reproduce
The model is RTMDet from here.
ONNX exported model has TensorRT specific NonMaximumSuppression node from here (probably slightly changed).
mmdetection model without the specific NMS op for TensorRT can be run without problems in CPU EP, CUDA EP and OpenVINO EP (OV tested with 1.17.0 Pypi package).
CUDA from https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
CUDNN from
https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.2/local_installers/11.x/cudnn-linux-x86_64-8.9.2.26_cuda11-archive.tar.xz/
TensorRT from
https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.0.1/tars/TensorRT-10.0.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz
Code and model are included in the zip file, with compiled custom op (Ubuntu 24.04, GCC 11.4). Tested with Python 3.10.
Runnable from
run.py
withLD_LIBRARY_PATH
set to correct paths (can useld_library.py
script as a help).If needed, I could try to make a docker image that reproduces the bug.
Urgency
No response
Platform
Linux
OS Version
Ubuntu 24.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.18.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
TensorRT
Execution Provider Library Version
CUDA 11.8.0, CUDNN 8.9.2.26, TensorRT 10.0.1.6
The text was updated successfully, but these errors were encountered: