[TensorRT EP] OOM (RAM) when loading ONNX model #21219

MiroPsota · 2024-07-01T10:18:22Z

Describe the issue

OOM (RAM) when loading the model - 50GiB is not enough.

To reproduce

The model is RTMDet from here.
ONNX exported model has TensorRT specific NonMaximumSuppression node from here (probably slightly changed).

mmdetection model without the specific NMS op for TensorRT can be run without problems in CPU EP, CUDA EP and OpenVINO EP (OV tested with 1.17.0 Pypi package).

CUDA from https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
CUDNN from
https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.2/local_installers/11.x/cudnn-linux-x86_64-8.9.2.26_cuda11-archive.tar.xz/
TensorRT from
https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.0.1/tars/TensorRT-10.0.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz

Code and model are included in the zip file, with compiled custom op (Ubuntu 24.04, GCC 11.4). Tested with Python 3.10.
Runnable from run.py with LD_LIBRARY_PATH set to correct paths (can use ld_library.py script as a help).

If needed, I could try to make a docker image that reproduces the bug.

Urgency

No response

Platform

Linux

OS Version

Ubuntu 24.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

TensorRT

Execution Provider Library Version

CUDA 11.8.0, CUDNN 8.9.2.26, TensorRT 10.0.1.6

The text was updated successfully, but these errors were encountered:

yf711 · 2024-07-01T20:47:36Z

Hi @MiroPsota Thanks for bringing up this issue!
Does the model with NMS op could run on previous version of ONNXRuntime+TRT ?
Also could you share the standard model (without nms) that you tested on CPU/GPU/OpenVINO EP?

MiroPsota · 2024-07-02T09:21:01Z

Zip with all the models and updated run.py.

1.18.0 and 1.18.1 - the mentioned problem occurs.

I used TensorRT 8.6.1.6 from here for 1.17.1 tests (ORT gpu pypi package).
OOM doesn't occur, but another error occurs (op not implemented) and it adds not wanted copy operations to a host and back. See the log. I will investigate further.

The ONNX model for TensorRT can be run without problems with mmdeploy, which uses TensorRT directly. One difference is that TRTBatchedNMS had originally a different domain, which I changed to trt.plugins according to docs, so it can be run in ORT.

github-actions bot added ep:CUDA issues related to the CUDA execution provider ep:OpenVINO issues related to OpenVINO execution provider ep:TensorRT issues related to TensorRT execution provider labels Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TensorRT EP] OOM (RAM) when loading ONNX model #21219

[TensorRT EP] OOM (RAM) when loading ONNX model #21219

MiroPsota commented Jul 1, 2024

yf711 commented Jul 1, 2024 •

edited

Loading

MiroPsota commented Jul 2, 2024

[TensorRT EP] OOM (RAM) when loading ONNX model #21219

[TensorRT EP] OOM (RAM) when loading ONNX model #21219

Comments

MiroPsota commented Jul 1, 2024

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

yf711 commented Jul 1, 2024 • edited Loading

MiroPsota commented Jul 2, 2024

yf711 commented Jul 1, 2024 •

edited

Loading