Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable support of multi-level nested control flow ops model for TRT EP #12147

Merged
merged 6 commits into from
Aug 2, 2022

Conversation

chilo-ms
Copy link
Contributor

@chilo-ms chilo-ms commented Jul 11, 2022

One of the reasons that TRT EP can't run multi-level nested control flow ops model is the subgraph of control flow op might contain fused TRT node after ORT partition. If this situation happens, TRT parser will complain about the non-recognized fused TRT node and then fail. Here we exclude those control flow ops before calling the TRT parser.
Also, outer scope values need to be handled in order to run the multi-level nested control flow ops model.

Note: This is a workaround version, will have a real fix in other PR.

const std::vector<NodeIndex>& node_index = graph.GetNodesInTopologicalOrder();

// We currently exclude "If" and "Loop" control flow ops from original node vector before calling TensorRT parser.
// The reason is, these control flow ops have subgraph which might contain TRT fused node after ORT partition.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about cases where the subgraph don't contain fused nodes?
i.e. presumably TRT can handle some loop/if cases (which don't have multi-level nesting?) we may lose perf in those cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: TRT non-recognized fused node, it's because the TRT parser is not properly handling all cases of Loop/If (i.e. it says it can support the If/Loop when there are portions of the body that it cannot handle)
shouldn't we raise the root issue to nvidia?

Copy link
Contributor Author

@chilo-ms chilo-ms Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about cases where the subgraph don't contain fused nodes? i.e. presumably TRT can handle some loop/if cases (which don't have multi-level nesting?) we may lose perf in those cases?

In that case, TRT can handle the loop/if ops as well as their subgraphs. Yes, TRT might have better perf.

But, due to the bottom-up approach of graph partitioning in ORT. ORT will first fuse the nodes in the subgraph into one "TRT fused" node if the nodes are supported and remove the original nodes. At this point, it's hard for TRT EP to tell ORT we don't want to fuse the nodes, especially if there are multiple levels of nested control flop ops.

Copy link
Contributor Author

@chilo-ms chilo-ms Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: TRT non-recognized fused node, it's because the TRT parser is not properly handling all cases of Loop/If (i.e. it says it can support the If/Loop when there are portions of the body that it cannot handle) shouldn't we raise the root issue to nvidia?

It's because the fused node is created by ORT and it's not the standard ONNX node, so TRT parser doesn't recognize. I think this is not an issue from Nvidia.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: TRT non-recognized fused node, it's because the TRT parser is not properly handling all cases of Loop/If (i.e. it says it can support the If/Loop when there are portions of the body that it cannot handle) shouldn't we raise the root issue to nvidia?

It's because the fused node is created by ORT and it's not the standard ONNX node, so TRT parser doesn't recognize. I think this is not an issue from Nvidia.

i think i understand now. you're saying the parser doesn't recognize any ops not in official onnx namespace? but i thought we have been able to support MS domain ops and other custom cuda ops along with TRT EP. let's discuss more offline.

// If this is the case, TensorRT parser will complain the non-recognized TRT fused node and fail.
for (const auto& index : nodes_vector) {
const auto& node = graph.GetNode(node_index[index]);
if (node->OpType() == "If" || node->OpType() == "Loop" || node->OpType() == "Scan") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how long the real fix will be worked out. But it seems ruling out some ops based on OpType is a nice option to have. Can we generalize the case by getting OpTypes from provider options?

@jywu-msft jywu-msft merged commit b39257a into master Aug 2, 2022
@jywu-msft jywu-msft deleted the chi/trt_nested_control_flow_op branch August 2, 2022 06:57
RandySheriffH pushed a commit that referenced this pull request Aug 2, 2022
#12147)

* Make multiple-level nested control flow op model work

* find correct input index

* find correct input index (cont.)

* enable nested layer unit tests for TRT EP

* add comment

* add Scan op to current workaround support of control flow op
RandySheriffH added a commit that referenced this pull request Aug 3, 2022
* update package version

* Prevent unbounded growth of command allocator memory (#12114)

Prevent unbounded growth of command allocator memory

* Update supported ops md for NNAPI/CoreML EP (#12245)

* update supported ops md

* address pr comments

* address pr comments

* wording

* Change native folder name for java macos arm64 (#12335)

* Bump async from 2.6.3 to 2.6.4 in /js/react_native/e2e (#11280)

Bumps [async](https://github.com/caolan/async) from 2.6.3 to 2.6.4.
- [Release notes](https://github.com/caolan/async/releases)
- [Changelog](https://github.com/caolan/async/blob/v2.6.4/CHANGELOG.md)
- [Commits](caolan/async@v2.6.3...v2.6.4)

---
updated-dependencies:
- dependency-name: async
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [js/rn] upgrade dependencies for e2e test (#11863)

* [js/rn] upgrade dependencies for e2e test

* use JDK11 only for gradle

* expand variable

* [js/rn] upgrade package react-native@^0.69.1 (#12155)

* [js/rn] upgrade package react-native@^0.69.1

* upgrade compile sdk to v31

* update ios version requirement

* update pod path for onnxruntime-react-native

* add missing build_java in Android testing stage. (#12187)

add missing build_java in testing

* Use specific Android NDK version in CI builds. (#12350)

Current builds use a NDK version that happens to be on the build machine. The build machine environment may change in ways that are outside of our control.
This change installs a specific version of NDK (the current LTS version 25.0.8775105) and uses it.

* Remove preview keyword from DirectML pacakge (#12368)

Remove preview keyword

Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>

* Scope CreateFileMapping2 to valid API partitions (#12374)

* Fix TRT custom op issue (#12283)

* Pass schema registry on CreateModel.

* Fix ORT_MINIMAL_BUILD.

* Fix build issue.

* Manually add optimization flag for Android Release builds. (#12390)

With recent versions of NDK (since 23), the `-O` optimization level compile flag is not being passed when building in the "Release" configuration.
More details here: android/ndk#1740

Our "Release" Android builds have been built without the optimization flag since we upgraded from NDK 21.

This change is a workaround to manually add `-O3` for "Release" Android builds.

* resolve conflicts in tensorRT related changes

* Enable support of multi-level nested control flow ops model for TRT EP (#12147)

* Make multiple-level nested control flow op model work

* find correct input index

* find correct input index (cont.)

* enable nested layer unit tests for TRT EP

* add comment

* add Scan op to current workaround support of control flow op

Co-authored-by: Jeff Bloomfield <38966965+jeffbloo@users.noreply.github.com>
Co-authored-by: Rachel Guo <35738743+YUNQIUGUO@users.noreply.github.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
Co-authored-by: Yi Zhang <zhanyi@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: sumitsays <sumitagarwal330@gmail.com>
Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
Co-authored-by: Justin Stoecker <justoeck@microsoft.com>
Co-authored-by: Yateng Hong <yatengh@microsoft.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants