Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: enable avx512 optimization for base64 encoding #43717

Closed
wants to merge 2 commits into from
Closed

src: enable avx512 optimization for base64 encoding #43717

wants to merge 2 commits into from

Conversation

lucshi
Copy link
Contributor

@lucshi lucshi commented Jul 7, 2022

Optimized the Base64 encoding by AVX512VL and AVX512VBMI instructions. The measurement of benchmark/buffers/buffer-base64-encoding on AWS EC2 m6i.large shows 2.4X performance (+140% gain). This optimization only applies to Linux X86_64 at present.

The algorithm is based on open project of https://github.com/WojciechMula/base64-avx512 with some modifications to fit in Node.js code base. I have added the original BSD-3 license in the patch.

@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Jul 7, 2022
src/base64-inl.h Outdated
@@ -182,6 +211,96 @@ inline size_t base64_encode(const char* src,
return dlen;
}


#if (defined(__x86_64) || defined(__x86_64__)) && \
(defined(__linux) || defined(__linux__))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this operation wouldn't apply outside of Linux? Should this be a check for gcc/clang instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code is for Linux GCC and I do not have a windows to test for it. I would try to remove the line 216 and add gcc/clang check and let the bot to test it. Thanks!


#if (defined(__x86_64) || defined(__x86_64__)) && \
(defined(__linux) || defined(__linux__))
#pragma GCC target("avx512vl", "avx512vbmi")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a bit cleaner to use __attribute__((target(...))) on the function itself? Or is this necessary for the #include to work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose #pragma so to explicitly reset the pragma to avoid polution on other code.

@mscdex
Copy link
Contributor

mscdex commented Jul 7, 2022

FWIW it might be better to go with something that is not so exclusive platform/CPU feature-wise. Awhile back I submitted #39775 which uses https://github.com/aklomp/base64 for broader platform/CPU support. I'll try and work on bringing that PR closer to being landable.

Optimized the Base64 encoding by AVX512VL and AVX512VBMI instructions. The
measurement of benchmark/buffers/buffer-base64-encoding on AWS EC2 m6i.large
shows 2.4X performance (+140% gain). This optimization only applies to
Linux X86_64 at present.
@lucshi
Copy link
Contributor Author

lucshi commented Jul 8, 2022

FWIW it might be better to go with something that is not so exclusive platform/CPU feature-wise. Awhile back I submitted #39775 which uses https://github.com/aklomp/base64 for broader platform/CPU support. I'll try and work on bringing that PR closer to being landable.

The considerations were that: AVX512VL is not a new technology (after Cannonlake in 2018) and the main computing intensive AWS EC2 instances are based on Icelake. The users even care about the performance difference of Base64 encoding may mainly work on new arch machine and have requirements of top performance of Node.js. So the purpose of this optimization is to provide the fastest option for base64 encoding with least code change for those users's requirement.

AFAIK WojciechMula's AVX512VL solution is the fastest algorithm which is faster than AVX512F/AVX512VBMI. AKLOMP's solution is good to support a variaty platforms but lack of the top performance option so I did not borrow that solution in this optimization.

@lucshi lucshi closed this Jul 8, 2022
@lucshi
Copy link
Contributor Author

lucshi commented Jul 8, 2022

Resubmit to modify the first commit msg.

@lucshi lucshi deleted the my-branch branch July 8, 2022 09:40
Optimized the Base64 encoding by AVX512VL instructions. Purpose of this
opt is not to provide a generic solution for variety of platforms, but
for modern cpu archs after Cannonlake which has been widly employeed by
AWS and other CSPs and for the users who want to obtain the known best
performance of Node.js. The measurement of buffer-base64-encoding on AWS
EC2 m6i.large instance shows 2.4X performance (+140% gain). This opt
only applies to X86_64 with GNU compiler at present.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants