Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DML EP] Add DML implementation for BiasGelu #13795

Merged
merged 2 commits into from
Dec 1, 2022

Conversation

PatriceVignola
Copy link
Contributor

Description

Add DML implementation for BiasGelu

@sumitsays
Copy link
Contributor

ADD1 only optimizes Relu and PRelu activations at the shader level. For rest other operators it will dispatch a separate shader which will be equivalent to execute decomposed form of BiasGelu. So, I think it won't add any benefit in performance.

@PatriceVignola
Copy link
Contributor Author

ADD1 only optimizes Relu and PRelu activations at the shader level. For rest other operators it will dispatch a separate shader which will be equivalent to execute decomposed form of BiasGelu. So, I think it won't add any benefit in performance.

Sure, but it still won't be worse than calling both operators separately. And the current behavior is that it falls back to the CPU, which is very bad. We could also optimize the Gelu activation into ADD1 if necessary in the future.

sumitsays
sumitsays previously approved these changes Dec 1, 2022
Copy link
Contributor

@sumitsays sumitsays left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

Copy link
Contributor

@fdwr fdwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pat: Do you know why it now falls back to CPU, given (I thought anyway, the last time we looked Sumit) there was a functional decomposition to Add and Gelu, which called DML? Maybe I misremember, but this is a general concern that if DML is the priority number #1 in the execution provider list, that any decomposable operators go to it first, rather than a fused version of the CPU one. I had some emails with Scott McKay long ago about this that I'll dredge up. Maybe there's a bug elsewhere to fix, and we can delete this temporary kernel later.

@PatriceVignola
Copy link
Contributor Author

Pat: Do you know why it now falls back to CPU, given (I thought anyway, the last time we looked Sumit) there was a functional decomposition to Add and Gelu, which called DML? Maybe I misremember, but this is a general concern that if DML is the priority number #1 in the execution provider list, that any decomposable operators go to it first, rather than a fused version of the CPU one. I had some emails with Scott McKay long ago about this that I'll dredge up. Maybe there's a bug elsewhere to fix, and we can delete this temporary kernel later.

I'm not familiar with how op decomposition works. In this case, it's not even a "fusion": BiasGelu is hardcoded in the ONNX model. Does decomposition work in this case? And where would I find the logic in the code?

@PatriceVignola PatriceVignola merged commit e9b92fd into main Dec 1, 2022
@PatriceVignola PatriceVignola deleted the user/pavignol/add-dml-bias-gelu branch December 1, 2022 17:23
@sumitsays
Copy link
Contributor

Pat: Do you know why it now falls back to CPU, given (I thought anyway, the last time we looked Sumit) there was a functional decomposition to Add and Gelu, which called DML? Maybe I misremember, but this is a general concern that if DML is the priority number #1 in the execution provider list, that any decomposable operators go to it first, rather than a fused version of the CPU one. I had some emails with Scott McKay long ago about this that I'll dredge up. Maybe there's a bug elsewhere to fix, and we can delete this temporary kernel later.

I'm not familiar with how op decomposition works. In this case, it's not even a "fusion": BiasGelu is hardcoded in the ONNX model. Does decomposition work in this case? And where would I find the logic in the code?

There is no such decomposition happens. If BiasGelu` is hardcoded, then in that case we need dedicated BiasGelu registration to make sure that it won't fallback to CPU.

@fdwr
Copy link
Contributor

fdwr commented Dec 2, 2022

There is no such decomposition happens. If BiasGelu` is hardcoded, then in that case we need dedicated BiasGelu registration to make sure that it won't fallback to CPU.

Alrighty. I must have been thinking of another one of the many other *elu's then, like Selu.

Function Since version Function version
Celu 12 12
Elu 6, 1 18
PRelu 16, 9, 7, 6, 1 16
Relu 14, 13, 6, 1 18
Selu 6, 1 18

henrywu2019 pushed a commit to henrywu2019/onnxruntime that referenced this pull request Dec 26, 2022
### Description
Add DML implementation for BiasGelu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants