Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding float to string kernel #1508

Merged
merged 40 commits into from
Dec 8, 2023

Conversation

thirtiseven
Copy link
Collaborator

@thirtiseven thirtiseven commented Oct 18, 2023

This PR adds a kernel for casting float to string, to make spark-rapids produce closer results when doing such casting.

Supporting this is a necessary part of format_number kernel, so I split it out as a 'subtask'.

This PR uses Ryū: fast float-to-string conversion (PLDI'18) as the solution for casting float/double to string. The results differ from the output of Spark's in some cases: sometimes the output is shorter (which is arguably more accurate) and sometimes the output may differ in the precise digits output (e.g., see ulfjack/ryu#83).

In most cases, the result will match Spark's results, and in the cases where it does not, the values will match when we cast them back to float. Tested in plugin PR below.

The logic part is based on ryu's C and Java implementation. I'm leaning towards keeping it consistent with ryu's codebase rather than making it more Cuda style to make it easier to apply upstream changes, but I'm not sure if that's a good practice.

Related plugin PR: NVIDIA/spark-rapids#9470

thirtiseven and others added 4 commits October 13, 2023 14:04
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Copy link
Collaborator

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far. Some minor comments.

// Range of numbers here is for normalizing the value.
// If the value is above or below the following limits, the output is converted to
// scientific notation in order to show (at most) the number of significant digits.
static constexpr double upper_limit = 10000000; // max is 1x10^7
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark's max?

src/main/cpp/src/cast_float_to_string.cu Outdated Show resolved Hide resolved
src/main/cpp/src/cast_float_to_string.cu Outdated Show resolved Hide resolved
thirtiseven and others added 3 commits October 19, 2023 15:57
Co-authored-by: Mike Wilson <hyperbolic2346@users.noreply.github.com>
Co-authored-by: Mike Wilson <hyperbolic2346@users.noreply.github.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
@thirtiseven thirtiseven marked this pull request as ready for review November 6, 2023 10:29
@thirtiseven thirtiseven changed the title [WIP] Adding float to string kernel Adding float to string kernel Nov 6, 2023
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
thirdparty/cudf Outdated Show resolved Hide resolved
thirtiseven and others added 3 commits November 14, 2023 23:17
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
@thirtiseven thirtiseven self-assigned this Nov 15, 2023
@thirtiseven thirtiseven changed the base branch from branch-23.12 to branch-24.02 November 22, 2023 13:39
thirtiseven and others added 4 commits November 23, 2023 16:23
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Co-authored-by: Nghia Truong <7416935+ttnghia@users.noreply.github.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
thirtiseven and others added 5 commits December 4, 2023 17:07
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
ttnghia
ttnghia previously approved these changes Dec 4, 2023
Copy link
Collaborator

@ttnghia ttnghia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good now 👍 . Please run compute-sanitizer to make sure there is no hidden issue. And do so if there are Spark integration tests too. Thanks.

Copy link
Collaborator

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question about layout and one include issue.

if (d_chars != nullptr) {
float_to_string(idx);
} else {
d_offsets[idx] = compute_output_size(d_floats.element<FloatType>(idx));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems odd to pass the index into format_float and then pass the float into compute_output_size. Why not pass the index into both for some symmetry and add auto const value = d_floats.element(idx); to compute_output_size? This is a stylistic thing and has no bearing on the output.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, done.

* limitations under the License.
*/

#include "cast_string.hpp"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why cast_string.hpp is quoted here. There isn't a cast_string.hpp in this directory.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
@thirtiseven
Copy link
Collaborator Author

build

@thirtiseven
Copy link
Collaborator Author

Tested with compute-sanitizer and plugin integration tests, merging it. Thanks all for review!

@thirtiseven thirtiseven merged commit 4c20e3a into NVIDIA:branch-24.02 Dec 8, 2023
3 checks passed
@sameerz sameerz added the bug Something isn't working label Dec 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants