Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TensorRT EP] OOM (RAM) when loading ONNX model #21219

Open
MiroPsota opened this issue Jul 1, 2024 · 2 comments
Open

[TensorRT EP] OOM (RAM) when loading ONNX model #21219

MiroPsota opened this issue Jul 1, 2024 · 2 comments
Labels
ep:CUDA issues related to the CUDA execution provider ep:OpenVINO issues related to OpenVINO execution provider ep:TensorRT issues related to TensorRT execution provider

Comments

@MiroPsota
Copy link

Describe the issue

OOM (RAM) when loading the model - 50GiB is not enough.

To reproduce

The model is RTMDet from here.
ONNX exported model has TensorRT specific NonMaximumSuppression node from here (probably slightly changed).

mmdetection model without the specific NMS op for TensorRT can be run without problems in CPU EP, CUDA EP and OpenVINO EP (OV tested with 1.17.0 Pypi package).

CUDA from https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
CUDNN from
https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.2/local_installers/11.x/cudnn-linux-x86_64-8.9.2.26_cuda11-archive.tar.xz/
TensorRT from
https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.0.1/tars/TensorRT-10.0.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz

Code and model are included in the zip file, with compiled custom op (Ubuntu 24.04, GCC 11.4). Tested with Python 3.10.
Runnable from run.py with LD_LIBRARY_PATH set to correct paths (can use ld_library.py script as a help).

If needed, I could try to make a docker image that reproduces the bug.

Urgency

No response

Platform

Linux

OS Version

Ubuntu 24.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

TensorRT

Execution Provider Library Version

CUDA 11.8.0, CUDNN 8.9.2.26, TensorRT 10.0.1.6

@github-actions github-actions bot added ep:CUDA issues related to the CUDA execution provider ep:OpenVINO issues related to OpenVINO execution provider ep:TensorRT issues related to TensorRT execution provider labels Jul 1, 2024
@yf711
Copy link
Contributor

yf711 commented Jul 1, 2024

Hi @MiroPsota Thanks for bringing up this issue!
Does the model with NMS op could run on previous version of ONNXRuntime+TRT ?
Also could you share the standard model (without nms) that you tested on CPU/GPU/OpenVINO EP?

@MiroPsota
Copy link
Author

Zip with all the models and updated run.py.

1.18.0 and 1.18.1 - the mentioned problem occurs.

I used TensorRT 8.6.1.6 from here for 1.17.1 tests (ORT gpu pypi package).
OOM doesn't occur, but another error occurs (op not implemented) and it adds not wanted copy operations to a host and back. See the log. I will investigate further.

The ONNX model for TensorRT can be run without problems with mmdeploy, which uses TensorRT directly. One difference is that TRTBatchedNMS had originally a different domain, which I changed to trt.plugins according to docs, so it can be run in ORT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:CUDA issues related to the CUDA execution provider ep:OpenVINO issues related to OpenVINO execution provider ep:TensorRT issues related to TensorRT execution provider
Projects
None yet
Development

No branches or pull requests

2 participants