[TensorRT EP] Segmentation fault when concurrently loading model using TensorRT EP #20089

tanmayv25 · 2024-03-26T17:18:07Z

Describe the issue

There seems to be a regression in ONNXRUNTIME library within ORT backend for Triton Inference Server when using TensorRT execution provider.

We started observing a segmentation fault coming from some memory corruption when trying to load multiple session of the model concurrently. The failing test is specifically: L0_onnx_optimization.

I have also written a small reproducer that uses C API to load the models similar to how the models are loaded in Triton's ONNX runtime backend.

ort_trt_test.cc

#include <assert.h>
#include <onnxruntime_c_api.h>

#include <iostream>
#include <vector>
#include <memory>
#include <string>
#include <thread>
#include <mutex>

const OrtApi* ort_api;

#define THROW_ON_ERROR(S)                                                    \
  do {                                                                       \
    OrtStatus* status__ = (S);                                               \
    if (status__ != nullptr) {                                               \
      OrtErrorCode code = ort_api->GetErrorCode(status__);                   \
      std::string msg = std::string(ort_api->GetErrorMessage(status__));     \
      ort_api->ReleaseStatus(status__);                                      \
      throw std::invalid_argument((std::string("onnx runtime error ") +      \
                                        std::to_string(code) + ": " + msg)   \
                                           .c_str());                       \
    }                                                                        \
  } while (false)


void run_ort_trt(int thread_count, bool is_serial) {
  const OrtApi* ort_api = OrtGetApiBase()->GetApi(ORT_API_VERSION);

  std::mutex serialized_mutex;

  OrtEnv* env;
  THROW_ON_ERROR(ort_api->CreateEnv(ORT_LOGGING_LEVEL_VERBOSE, "log", &env));

  OrtSessionOptions* session_options;
  THROW_ON_ERROR(ort_api->CreateSessionOptions(&session_options));

  const char* model_path = "model.onnx";

  std::vector<std::thread> threads;
  for (int i = 0; i < thread_count; ++i)
    {
       // create new thread using a Lambda
        threads.emplace_back([&]() {
          if (is_serial) {
            serialized_mutex.lock();
          }
          // Make a clone for the session options for this instance...
          OrtSessionOptions* soptions;
          THROW_ON_ERROR(
            ort_api->CloneSessionOptions(session_options, &soptions));

          OrtTensorRTProviderOptionsV2* tensorrt_options;
          THROW_ON_ERROR(ort_api->CreateTensorRTProviderOptions(&tensorrt_options));
          std::unique_ptr<OrtTensorRTProviderOptionsV2, decltype(ort_api->ReleaseTensorRTProviderOptions)> rel_trt_options(
          tensorrt_options, ort_api->ReleaseTensorRTProviderOptions);
          std::string int8_calibration_table_name;
          std::string trt_engine_cache_path;
          std::vector<std::string> param_keys, keys, values;
          //keys.push_back("trt_engine_cache_enable");
          //values.push_back("1");

          //keys.push_back("trt_engine_cache_path");
          //values.push_back("/opt/tritonserver/qa/L0_onnx_optimization/trt_cache");
  
          std::vector<const char*> c_keys, c_values;
          if (!keys.empty() && !values.empty()) {
              for (size_t i = 0; i < keys.size(); ++i) {
                c_keys.push_back(keys[i].c_str());
                c_values.push_back(values[i].c_str());
              }
          THROW_ON_ERROR(ort_api->UpdateTensorRTProviderOptions(
            rel_trt_options.get(), c_keys.data(), c_values.data(),
            keys.size()));
          }

          THROW_ON_ERROR(ort_api->SessionOptionsAppendExecutionProvider_TensorRT_V2(static_cast<OrtSessionOptions*>(soptions),
                                                        rel_trt_options.get()));

          std::cout << "Running ORT TRT EP with default provider options" << std::endl;

          OrtSession* session;

          THROW_ON_ERROR(ort_api->CreateSession(
            env, model_path, soptions,
            &session));

             if (is_serial) {
            serialized_mutex.unlock();
          }

            ort_api->ReleaseSession(session);
            ort_api->ReleaseSessionOptions(soptions);
        });
    }

    for (auto& thread: threads)
    {
      thread.join();
    }

  //*****************************************************************************************
  // It's not suggested to directly new OrtTensorRTProviderOptionsV2 to get provider options
  //*****************************************************************************************
  //
  // auto tensorrt_options = get_default_trt_provider_options();
  // session_options.AppendExecutionProvider_TensorRT_V2(*tensorrt_options.get());

  //**************************************************************************************************************************
  // It's suggested to use CreateTensorRTProviderOptions() to get provider options
  // since ORT takes care of valid options for you
  //**************************************************************************************************************************
  ort_api->ReleaseSessionOptions(session_options);
  ort_api->ReleaseEnv(env);
}



int main(int argc, char *argv[]) {
  int thread_count = 1;
  bool is_serial = false;
  if (argc > 1) {
    thread_count = std::stol(argv[1]);
  }
  if (argc > 2) {
    is_serial = (std::stol(argv[2]) > 0);
  }
  run_ort_trt(thread_count, is_serial);
  return 0;
}

Test Combinations and Results

The first argument of the binary describes how many ort sessions will be loaded on the GPU.
The second argument sets whether or not to load these sessions concurrently: 0 means the sessions will be loaded concurrently while >0 means the sessions are loaded one at a time.

CLI invocations	Result
./ort_trt_test 1 0	Passes
./ort_trt_test 10 1	Passes
./ort_trt_test 2 0	SegFault

Additionally, the backtrace of the segmentation fault is:

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140735072432128) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140735072432128) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140735072432128, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff5792476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff57787f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff57d9676 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff592bb77 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#6  0x00007ffff57f0cfc in malloc_printerr (str=str@entry=0x7ffff592e790 "double free or corruption (out)") at ./malloc/malloc.c:5664
#7  0x00007ffff57f2e70 in _int_free (av=0x7ffff596ac80 <main_arena>, p=0x7ffee8027650, have_lock=<optimized out>) at ./malloc/malloc.c:4588
#8  0x00007ffff57f5453 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391
#9  0x00007fffe20cba6b in __gnu_cxx::new_allocator<std::unique_ptr<onnxruntime::TensorRTCustomOp, std::default_delete<onnxruntime::TensorRTCustomOp> > >::deallocate (
    this=0x7fffe2167c70 <onnxruntime::CreateTensorRTCustomOpDomainList(std::vector<OrtCustomOpDomain*, std::allocator<OrtCustomOpDomain*> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::created_custom_op_list>, __p=0x7ffee8027660, __t=18446744073667605074) at /usr/include/c++/11/ext/new_allocator.h:145
#10 0x00007fffe20cb186 in std::allocator_traits<std::allocator<std::unique_ptr<onnxruntime::TensorRTCustomOp, std::default_delete<onnxruntime::TensorRTCustomOp> > > >::deallocate (__a=..., 
    __p=0x7ffee8027660, __n=18446744073667605074) at /usr/include/c++/11/bits/alloc_traits.h:496
#11 0x00007fffe20cab48 in std::_Vector_base<std::unique_ptr<onnxruntime::TensorRTCustomOp, std::default_delete<onnxruntime::TensorRTCustomOp> >, std::allocator<std::unique_ptr<onnxruntime::TensorRTCustomOp, std::default_delete<onnxruntime::TensorRTCustomOp> > > >::_M_deallocate (
    this=0x7fffe2167c70 <onnxruntime::CreateTensorRTCustomOpDomainList(std::vector<OrtCustomOpDomain*, std::allocator<OrtCustomOpDomain*> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::created_custom_op_list>, __p=0x7ffee8027660, __n=18446744073667605074) at /usr/include/c++/11/bits/stl_vector.h:354
#12 0x00007fffe20cb674 in std::vector<std::unique_ptr<onnxruntime::TensorRTCustomOp, std::default_delete<onnxruntime::TensorRTCustomOp> >, std::allocator<std::unique_ptr<onnxruntime::TensorRTCustomOp, std::default_delete<onnxruntime::TensorRTCustomOp> > > >::_M_realloc_insert<std::unique_ptr<onnxruntime::TensorRTCustomOp, std::default_delete<onnxruntime::TensorRTCustomOp> > > (
    this=0x7fffe2167c70 <onnxruntime::CreateTensorRTCustomOpDomainList(std::vector<OrtCustomOpDomain*, std::allocator<OrtCustomOpDomain*> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::created_custom_op_list>, __position=std::unique_ptr<onnxruntime::TensorRTCustomOp> = {get() = 0x810}) at /usr/include/c++/11/bits/vector.tcc:500
#13 0x00007fffe20cadce in std::vector<std::unique_ptr<onnxruntime::TensorRTCustomOp, std::default_delete<onnxruntime::TensorRTCustomOp> >, std::allocator<std::unique_ptr<onnxruntime::TensorRTCustomOp, std::default_delete<onnxruntime::TensorRTCustomOp> > > >::emplace_back<std::unique_ptr<onnxruntime::TensorRTCustomOp, std::default_delete<onnxruntime::TensorRTCustomOp> > > (
    this=0x7fffe2167c70 <onnxruntime::CreateTensorRTCustomOpDomainList(std::vector<OrtCustomOpDomain*, std::allocator<OrtCustomOpDomain*> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::created_custom_op_list>) at /usr/include/c++/11/bits/vector.tcc:121
#14 0x00007fffe20ca554 in std::vector<std::unique_ptr<onnxruntime::TensorRTCustomOp, std::default_delete<onnxruntime::TensorRTCustomOp> >, std::allocator<std::unique_ptr<onnxruntime::TensorRTCustomOp, std::default_delete<onnxruntime::TensorRTCustomOp> > > >::push_back (
    this=0x7fffe2167c70 <onnxruntime::CreateTensorRTCustomOpDomainList(std::vector<OrtCustomOpDomain*, std::allocator<OrtCustomOpDomain*> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::created_custom_op_list>, __x=...) at /usr/include/c++/11/bits/stl_vector.h:1204
#15 0x00007fffe20c9166 in onnxruntime::CreateTensorRTCustomOpDomainList (domain_list=std::vector of length 0, capacity 0, extra_plugin_lib_paths="")
    at /workspace/onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider_custom_ops.cc:76
#16 0x00007fffe20e4ea4 in onnxruntime::ProviderInfo_TensorRT_Impl::GetTensorRTCustomOpDomainList (this=0x7fffe2167038 <onnxruntime::g_info>, domain_list=std::vector of length 0, capacity 0, 
    extra_plugin_lib_paths="") at /workspace/onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_provider_factory.cc:36
#17 0x00007fff95d27fac in AddTensorRTCustomOpDomainToSessionOption (options=0x7ffeb00137e0, extra_plugin_lib_paths="") at /workspace/onnxruntime/onnxruntime/core/session/provider_bridge_ort.cc:1694
#18 0x00007fff95d295ba in OrtApis::SessionOptionsAppendExecutionProvider_TensorRT_V2 (options=0x7ffeb00137e0, tensorrt_options=0x7ffeb0013aa0)
    at /workspace/onnxruntime/onnxruntime/core/session/provider_bridge_ort.cc:1899
#19 0x00007fffe217a6a6 in triton::backend::onnxruntime::ModelState::LoadModel (this=0x7fff9848bf80, artifact_name="model.onnx", instance_group_kind=TRITONSERVER_INSTANCEGROUPKIND_GPU, 
    instance_group_device_id=0, model_path=0x7ffeb0012ac0, session=0x7ffeb0012ae0, default_allocator=0x7ffeb0012ae8, stream=0x7ffeb0012f40) at /tmp/tritonbuild/onnxruntime/src/onnxruntime.cc:526
#20 0x00007fffe217f70a in triton::backend::onnxruntime::ModelInstanceState::ModelInstanceState (this=0x7ffeb0012a30, model_state=0x7fff9848bf80, triton_model_instance=0x7ffeb0012430)

To reproduce

Compile the described reproducer(ort_trt_test) and execute it with mentioned CLI options.

Urgency

The regression is quite serious and impact users in production environment.

Platform

Linux

OS Version

5.15.0-89-generic

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.17.2

ONNX Runtime API

C

Architecture

X64

Execution Provider

TensorRT

Execution Provider Library Version

8.6.3.1+cuda12.2.2.009

The text was updated successfully, but these errors were encountered:

chilo-ms · 2024-03-26T22:57:14Z

@tanmayv25 Thanks for raising this issue.
Here is the PR to fix this concurrency issue and it can fix the issue on my side.
Could you help double check as well? Thank you.

The `CreateTensorRTCustomOpDomainList()` is not thread-safe due to its static variables, `created_custom_op_list` and `custom_op_domain`. This PR makes sure synchronization using mutex. see issue: #20089

tanmayv25 · 2024-04-02T23:54:11Z

@chilo-ms I can confirm that the linked PR has fixed the issue. Thanks a lot!

chilo-ms · 2024-04-03T15:57:39Z

@tanmayv25, thanks for verifying.
fyi, the fix will be in ORT 1.17.3 patch release.

…#20093) The `CreateTensorRTCustomOpDomainList()` is not thread-safe due to its static variables, `created_custom_op_list` and `custom_op_domain`. This PR makes sure synchronization using mutex. see issue: microsoft#20089

github-actions bot added the ep:TensorRT issues related to TensorRT execution provider label Mar 26, 2024

yf711 self-assigned this Mar 26, 2024

chilo-ms mentioned this issue Mar 26, 2024

[TensorRT EP] Fix concurrency issue for TRT custom op list #20093

Merged

tanmayv25 closed this as completed Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TensorRT EP] Segmentation fault when concurrently loading model using TensorRT EP #20089

[TensorRT EP] Segmentation fault when concurrently loading model using TensorRT EP #20089

tanmayv25 commented Mar 26, 2024

chilo-ms commented Mar 26, 2024 •

edited

Loading

tanmayv25 commented Apr 2, 2024 •

edited

Loading

chilo-ms commented Apr 3, 2024

[TensorRT EP] Segmentation fault when concurrently loading model using TensorRT EP #20089

[TensorRT EP] Segmentation fault when concurrently loading model using TensorRT EP #20089

Comments

tanmayv25 commented Mar 26, 2024

Describe the issue

ort_trt_test.cc

Test Combinations and Results

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

chilo-ms commented Mar 26, 2024 • edited Loading

tanmayv25 commented Apr 2, 2024 • edited Loading

chilo-ms commented Apr 3, 2024

chilo-ms commented Mar 26, 2024 •

edited

Loading

tanmayv25 commented Apr 2, 2024 •

edited

Loading