Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] JSON reader should parse quoted floating point values as null #4647

Closed
andygrove opened this issue Jan 27, 2022 · 3 comments
Closed

[FEA] JSON reader should parse quoted floating point values as null #4647

andygrove opened this issue Jan 27, 2022 · 3 comments
Assignees
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request

Comments

@andygrove
Copy link
Contributor

andygrove commented Jan 27, 2022

Is your feature request related to a problem? Please describe.
Given the following JSON file, Spark will produce the values 1.0 and null, but on GPU we would produce 1.0 for both rows (once #4637 is merged).

{ "number": 1.0 }
{ "number": "1.0" }

Describe the solution you'd like
We should have consistent behavior with Spark. We need to remove the XFAIL from the test in json_test.py that references this issue.

Describe alternatives you've considered
None

Additional context
None

@andygrove andygrove added feature request New feature or request ? - Needs Triage Need team to review and classify labels Jan 27, 2022
@andygrove andygrove mentioned this issue Jan 28, 2022
62 tasks
@revans2
Copy link
Collaborator

revans2 commented Jan 28, 2022

I don't see a JSON file attached or anywhere here.

Also this is going to be impossible to do without help from CUDF. If we are really concerned about this we are either going to have to have CUDF parse all of the values the way Spark wants them to be parsed, which I don't think will ever happen, or we need a way to get a boolean back that this was quoted or not in the original data. If we want the later we are going to need to request it from CUDF as a part of their JSON rewrite.

@revans2 revans2 added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Jan 28, 2022
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Feb 1, 2022
@andygrove
Copy link
Contributor Author

I filed a feature request against cuDF - rapidsai/cudf#10283

I also update the issue description to include the sample JSON that I forgot to include when originally filing this issue.

@revans2
Copy link
Collaborator

revans2 commented Mar 13, 2024

This was fixed as a part of #10542, but I didn't realize it because of all of the other tests that xpass as a part of JSON. So I just removed the xfail as a part of #10575

@revans2 revans2 closed this as completed Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants