Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refactor and optimize Frame.where (NVIDIA#11168)
This PR is a substantial refactoring of `Frame.where`. It removes many dead code paths, excises numerous unnecessary copies, and simplifies and consolidates various parts of the logic. It also splits up parts of the implementation into the specific Frame classes for which they are used. Prior to this PR, all the code was contained in a single function that essentially had completely independent code paths for DataFrame vs SingleColumnFrame. Splitting these into methods of the appropriate classes also makes mypy much happier. The resulting code is significantly faster. I'll post more benchmarks soon, but we see improvements from 20% to up to 70%, even for reasonable data sizes (e.g. 1 million rows). You have to go past 10 million rows before the performance improvements are washed out by the sheer volume of computation time. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: rapidsai/cudf#11168
- Loading branch information