-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DML EP] Add DML implementation for BiasGelu #13795
Conversation
ADD1 only optimizes Relu and PRelu activations at the shader level. For rest other operators it will dispatch a separate shader which will be equivalent to execute decomposed form of BiasGelu. So, I think it won't add any benefit in performance. |
Sure, but it still won't be worse than calling both operators separately. And the current behavior is that it falls back to the CPU, which is very bad. We could also optimize the Gelu activation into ADD1 if necessary in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pat: Do you know why it now falls back to CPU, given (I thought anyway, the last time we looked Sumit) there was a functional decomposition to Add and Gelu, which called DML? Maybe I misremember, but this is a general concern that if DML is the priority number #1 in the execution provider list, that any decomposable operators go to it first, rather than a fused version of the CPU one. I had some emails with Scott McKay long ago about this that I'll dredge up. Maybe there's a bug elsewhere to fix, and we can delete this temporary kernel later.
I'm not familiar with how op decomposition works. In this case, it's not even a "fusion": |
There is no such decomposition happens. If BiasGelu` is hardcoded, then in that case we need dedicated BiasGelu registration to make sure that it won't fallback to CPU. |
Alrighty. I must have been thinking of another one of the many other *elu's then, like Selu.
|
### Description Add DML implementation for BiasGelu
Description
Add DML implementation for BiasGelu