-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force inlining to improve AST performance #9530
Force inlining to improve AST performance #9530
Conversation
…f the expression_evaluator to improve null performance.
Out of morbid curiosity, I tested a serial compilation of libcudf without ccache.
After:
And here are benchmarks (note the differences for nullable data): BenchmarksBefore:
After
|
Codecov Report
@@ Coverage Diff @@
## branch-21.12 #9530 +/- ##
================================================
- Coverage 10.79% 10.67% -0.12%
================================================
Files 116 117 +1
Lines 18869 19884 +1015
================================================
+ Hits 2036 2123 +87
- Misses 16833 17761 +928
Continue to review full report at Codecov.
|
@jrhemstad @bdice do we want to keep this PR very limited in scope, or should I should modify all places in libcudf that use |
There are a decent handful of existing uses of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HDFI/DFI LGTM
@gpucibot merge |
This PR is a follow-up to #9530 to standardize the names of the macros used for the `__host__ __device__` attributes. Aliases for `__device__` and combinations with inlining have been removed, and the only remaining macro is `CUDF_HOST_DEVICE` which is `__host__ __device__` in device code and empty in host code. See #9530 (comment) for more discussion. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Jake Hemstad (https://github.com/jrhemstad) - Bradley Dice (https://github.com/bdice) - Mark Harris (https://github.com/harrism) URL: #9797
This PR is a minimal set of changes that appear to be required to maximize inlining of code involved in AST-based expression evaluation. Specifically, applying the
__forceinline__
qualifier to the type dispatcher results in a significant performance improvement when nullable data is passed through the AST evaluator. I observe roughly a 2x performance improvement with a negligible increase in compilation time. Note that the specific improvements appear to be heavily dependent on the architecture being tested on. Interestingly, removing the__forceinline__
on theevaluate
methods results in performance regressing back to its original values despite them being defined inline, an unexpected result.