Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HVX F32-raddstoreexpminusmax microkernels for Softmax #6646

Merged
merged 5 commits into from
Jul 11, 2024

Conversation

ejparkqc
Copy link
Contributor

  • Initial implementation and test added.
  • xnnpack/intrinsics-polyfill.h has the horizontal sum code (Q6_f32_vrsum_Vsf) using vshuff and vadd.

// This source code is licensed under the BSD-style license found in the
// LICENSE file in the root directory of this source tree.

$assert BATCH_TILE % 4 == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiple of 32 floats.
$assert BATCH_TILE % 32 == 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Frank, fixed it. Also, I will revisit other previous kernels if I put 4 instead of 32.

@ejparkqc ejparkqc force-pushed the f32-raddstoreexpminusmax branch 12 times, most recently from 243cf7a to fb2bf91 Compare July 3, 2024 17:56
@ejparkqc ejparkqc changed the title HVX F32-raddstoreexpminusmax for Softmax HVX F32-raddstoreexpminusmax microkernels for Softmax Jul 3, 2024
@ejparkqc ejparkqc force-pushed the f32-raddstoreexpminusmax branch 4 times, most recently from 2fd85ea to 92ee2dd Compare July 9, 2024 18:02
@ejparkqc ejparkqc requested a review from fbarchard July 9, 2024 18:28
float* svin2 = (float *) &vin2;

for(int i = 0; i < 32; i++)
svin1[i] = svin1[i] / svin2[i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest adding a todo to improve on this. if div is not an option, perhaps try to make calling code use 'nr' - reciprocal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question - does XNNPACK has nr code already implemented or should we add? I added as a TODO for now, but definitely want to improve this.

@alankelly
Copy link
Collaborator

Thanks, importing.

@copybara-service copybara-service bot merged commit 02b300f into google:master Jul 11, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants