-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HVX F32-raddstoreexpminusmax
microkernels for Softmax
#6646
HVX F32-raddstoreexpminusmax
microkernels for Softmax
#6646
Conversation
ejparkqc
commented
Jun 30, 2024
- Initial implementation and test added.
- xnnpack/intrinsics-polyfill.h has the horizontal sum code (Q6_f32_vrsum_Vsf) using vshuff and vadd.
ac94723
to
2d4f5c8
Compare
// This source code is licensed under the BSD-style license found in the | ||
// LICENSE file in the root directory of this source tree. | ||
|
||
$assert BATCH_TILE % 4 == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multiple of 32 floats.
$assert BATCH_TILE % 32 == 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Frank, fixed it. Also, I will revisit other previous kernels if I put 4 instead of 32.
243cf7a
to
fb2bf91
Compare
F32-raddstoreexpminusmax
microkernels for Softmax
2fd85ea
to
92ee2dd
Compare
src/xnnpack/intrinsics-polyfill.h
Outdated
float* svin2 = (float *) &vin2; | ||
|
||
for(int i = 0; i < 32; i++) | ||
svin1[i] = svin1[i] / svin2[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest adding a todo to improve on this. if div is not an option, perhaps try to make calling code use 'nr' - reciprocal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question - does XNNPACK has nr code already implemented or should we add? I added as a TODO for now, but definitely want to improve this.
92ee2dd
to
458ec60
Compare
458ec60
to
73e810f
Compare
Thanks, importing. |