Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Fuzz testing of CSV #6926

Open
Tracked by #2063
revans2 opened this issue Oct 27, 2022 · 0 comments
Open
Tracked by #2063

[FEA] Fuzz testing of CSV #6926

revans2 opened this issue Oct 27, 2022 · 0 comments
Labels
feature request New feature or request test Only impacts tests

Comments

@revans2
Copy link
Collaborator

revans2 commented Oct 27, 2022

Is your feature request related to a problem? Please describe.
CSV is not a very well defined standard. There are lots and lots of different options for parsing values, escaping characters and configuring delimiters. Because of this complexity we should develop a fuzz testing framework to be able to verify that our code behaves the same as Spark on the CPU. We should concentrate on the default settings.

format: UTF-8
delimiter: ,
quote: "
escape: \
lineSeparator: (not set so it is \r|\n|\r\n)
charToEscapeQuoteEscaping: not set
comment: \u0000 (aka not set)
ignoreLeadingWhiteSpace: false
ignoreTrailingWhiteSpace: false
emptyValue: (empty string)
unescapedQuoteHandling: STOP_AT_DELIMITER

And a schema is also provided.

It would be great to expand this out further in the future, but for now this is the most important. The next things to look at testing would be changing the delimiter.

@revans2 revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify test Only impacts tests labels Oct 27, 2022
@revans2 revans2 mentioned this issue Oct 27, 2022
38 tasks
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request test Only impacts tests
Projects
None yet
Development

No branches or pull requests

2 participants