Enable support of multi-level nested control flow ops model for TRT EP #12147

chilo-ms · 2022-07-11T23:12:28Z

One of the reasons that TRT EP can't run multi-level nested control flow ops model is the subgraph of control flow op might contain fused TRT node after ORT partition. If this situation happens, TRT parser will complain about the non-recognized fused TRT node and then fail. Here we exclude those control flow ops before calling the TRT parser.
Also, outer scope values need to be handled in order to run the multi-level nested control flow ops model.

Note: This is a workaround version, will have a real fix in other PR.

jywu-msft · 2022-07-12T16:04:45Z

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc

+  const std::vector<NodeIndex>& node_index = graph.GetNodesInTopologicalOrder();
+
+  // We currently exclude "If" and "Loop" control flow ops from original node vector before calling TensorRT parser.
+  // The reason is, these control flow ops have subgraph which might contain TRT fused node after ORT partition.


what about cases where the subgraph don't contain fused nodes?
i.e. presumably TRT can handle some loop/if cases (which don't have multi-level nesting?) we may lose perf in those cases?

re: TRT non-recognized fused node, it's because the TRT parser is not properly handling all cases of Loop/If (i.e. it says it can support the If/Loop when there are portions of the body that it cannot handle)
shouldn't we raise the root issue to nvidia?

what about cases where the subgraph don't contain fused nodes? i.e. presumably TRT can handle some loop/if cases (which don't have multi-level nesting?) we may lose perf in those cases?

In that case, TRT can handle the loop/if ops as well as their subgraphs. Yes, TRT might have better perf.

But, due to the bottom-up approach of graph partitioning in ORT. ORT will first fuse the nodes in the subgraph into one "TRT fused" node if the nodes are supported and remove the original nodes. At this point, it's hard for TRT EP to tell ORT we don't want to fuse the nodes, especially if there are multiple levels of nested control flop ops.

re: TRT non-recognized fused node, it's because the TRT parser is not properly handling all cases of Loop/If (i.e. it says it can support the If/Loop when there are portions of the body that it cannot handle) shouldn't we raise the root issue to nvidia?

It's because the fused node is created by ORT and it's not the standard ONNX node, so TRT parser doesn't recognize. I think this is not an issue from Nvidia.

re: TRT non-recognized fused node, it's because the TRT parser is not properly handling all cases of Loop/If (i.e. it says it can support the If/Loop when there are portions of the body that it cannot handle) shouldn't we raise the root issue to nvidia?

It's because the fused node is created by ORT and it's not the standard ONNX node, so TRT parser doesn't recognize. I think this is not an issue from Nvidia.

i think i understand now. you're saying the parser doesn't recognize any ops not in official onnx namespace? but i thought we have been able to support MS domain ops and other custom cuda ops along with TRT EP. let's discuss more offline.

stevenlix · 2022-07-30T01:49:47Z

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc

+  // If this is the case, TensorRT parser will complain the non-recognized TRT fused node and fail.
+  for (const auto& index : nodes_vector) {
+    const auto& node = graph.GetNode(node_index[index]);
+    if (node->OpType() == "If" || node->OpType() == "Loop" || node->OpType() == "Scan") {


Not sure how long the real fix will be worked out. But it seems ruling out some ops based on OpType is a nice option to have. Can we generalize the case by getting OpTypes from provider options?

#12147) * Make multiple-level nested control flow op model work * find correct input index * find correct input index (cont.) * enable nested layer unit tests for TRT EP * add comment * add Scan op to current workaround support of control flow op

* update package version * Prevent unbounded growth of command allocator memory (#12114) Prevent unbounded growth of command allocator memory * Update supported ops md for NNAPI/CoreML EP (#12245) * update supported ops md * address pr comments * address pr comments * wording * Change native folder name for java macos arm64 (#12335) * Bump async from 2.6.3 to 2.6.4 in /js/react_native/e2e (#11280) Bumps [async](https://github.com/caolan/async) from 2.6.3 to 2.6.4. - [Release notes](https://github.com/caolan/async/releases) - [Changelog](https://github.com/caolan/async/blob/v2.6.4/CHANGELOG.md) - [Commits](caolan/async@v2.6.3...v2.6.4) --- updated-dependencies: - dependency-name: async dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [js/rn] upgrade dependencies for e2e test (#11863) * [js/rn] upgrade dependencies for e2e test * use JDK11 only for gradle * expand variable * [js/rn] upgrade package react-native@^0.69.1 (#12155) * [js/rn] upgrade package react-native@^0.69.1 * upgrade compile sdk to v31 * update ios version requirement * update pod path for onnxruntime-react-native * add missing build_java in Android testing stage. (#12187) add missing build_java in testing * Use specific Android NDK version in CI builds. (#12350) Current builds use a NDK version that happens to be on the build machine. The build machine environment may change in ways that are outside of our control. This change installs a specific version of NDK (the current LTS version 25.0.8775105) and uses it. * Remove preview keyword from DirectML pacakge (#12368) Remove preview keyword Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com> * Scope CreateFileMapping2 to valid API partitions (#12374) * Fix TRT custom op issue (#12283) * Pass schema registry on CreateModel. * Fix ORT_MINIMAL_BUILD. * Fix build issue. * Manually add optimization flag for Android Release builds. (#12390) With recent versions of NDK (since 23), the `-O` optimization level compile flag is not being passed when building in the "Release" configuration. More details here: android/ndk#1740 Our "Release" Android builds have been built without the optimization flag since we upgraded from NDK 21. This change is a workaround to manually add `-O3` for "Release" Android builds. * resolve conflicts in tensorRT related changes * Enable support of multi-level nested control flow ops model for TRT EP (#12147) * Make multiple-level nested control flow op model work * find correct input index * find correct input index (cont.) * enable nested layer unit tests for TRT EP * add comment * add Scan op to current workaround support of control flow op Co-authored-by: Jeff Bloomfield <38966965+jeffbloo@users.noreply.github.com> Co-authored-by: Rachel Guo <35738743+YUNQIUGUO@users.noreply.github.com> Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Yi Zhang <zhanyi@microsoft.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: sumitsays <sumitagarwal330@gmail.com> Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com> Co-authored-by: Justin Stoecker <justoeck@microsoft.com> Co-authored-by: Yateng Hong <yatengh@microsoft.com> Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>

chilo-ms added 5 commits June 27, 2022 16:45

Make multiple-level nested control flow op model work

1376ea7

find correct input index

1cd7ef6

find correct input index (cont.)

bb2e833

enable nested layer unit tests for TRT EP

e766e91

add comment

b538524

chilo-ms requested review from stevenlix and jywu-msft July 11, 2022 23:12

jywu-msft reviewed Jul 12, 2022

View reviewed changes

add Scan op to current workaround support of control flow op

2c8d84e

stevenlix reviewed Jul 30, 2022

View reviewed changes

jywu-msft approved these changes Aug 1, 2022

View reviewed changes

jywu-msft added the release:1.12.1 label Aug 1, 2022

jywu-msft merged commit b39257a into master Aug 2, 2022

jywu-msft deleted the chi/trt_nested_control_flow_op branch August 2, 2022 06:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable support of multi-level nested control flow ops model for TRT EP #12147

Enable support of multi-level nested control flow ops model for TRT EP #12147

chilo-ms commented Jul 11, 2022 •

edited

Loading

jywu-msft Jul 12, 2022

jywu-msft Jul 12, 2022

chilo-ms Jul 12, 2022 •

edited

Loading

chilo-ms Jul 12, 2022 •

edited

Loading

jywu-msft Jul 12, 2022

stevenlix Jul 30, 2022

Enable support of multi-level nested control flow ops model for TRT EP #12147

Enable support of multi-level nested control flow ops model for TRT EP #12147

Conversation

chilo-ms commented Jul 11, 2022 • edited Loading

jywu-msft Jul 12, 2022

Choose a reason for hiding this comment

jywu-msft Jul 12, 2022

Choose a reason for hiding this comment

chilo-ms Jul 12, 2022 • edited Loading

Choose a reason for hiding this comment

chilo-ms Jul 12, 2022 • edited Loading

Choose a reason for hiding this comment

jywu-msft Jul 12, 2022

Choose a reason for hiding this comment

stevenlix Jul 30, 2022

Choose a reason for hiding this comment

chilo-ms commented Jul 11, 2022 •

edited

Loading

chilo-ms Jul 12, 2022 •

edited

Loading

chilo-ms Jul 12, 2022 •

edited

Loading