Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] CSV parsing of malformed lines is empty string not null #2068

Open
revans2 opened this issue Apr 1, 2021 · 0 comments
Open

[BUG] CSV parsing of malformed lines is empty string not null #2068

revans2 opened this issue Apr 1, 2021 · 0 comments
Labels
bug Something isn't working

Comments

@revans2
Copy link
Collaborator

revans2 commented Apr 1, 2021

Describe the bug
In CSV it is possible to have a malformed line where there is not data for each entry on a line at the end.

A,B,C
number,

In these cases Spark will insert a null no matter what, but cudf always treats it like an empty string, and then applies the rules for null values. So if the null value is an empty string, which is the default, then everything looks fine. If not then cudf produces different results.

Steps/Code to reproduce bug
We have an integration test for this test_basic_read in the CSV tests for trucks-null.csv where nullValue is set to null

@revans2 revans2 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Apr 1, 2021
@revans2 revans2 mentioned this issue Apr 1, 2021
38 tasks
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants