Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for reading decimal types from JSON and CSV #4825

Merged
merged 11 commits into from
Mar 7, 2022

Conversation

andygrove
Copy link
Contributor

@andygrove andygrove commented Feb 18, 2022

Signed-off-by: Andy Grove andygrove@nvidia.com

Closes #4824 and #4615 and #4646

The main change in this PR is adding support for reading decimal values from CSV and JSON by reading them as strings and then casting them to decimal.

There is also a fix for a bug in handling invalid JSON floating-point values such as .1 and 1.which was discovered during this work.

There are also some updates to the compatibility guide related to previous JSON and CSV PRs. I can PR those docs changes separately if that is preferred.

@andygrove andygrove added this to the Feb 14 - Feb 25 milestone Feb 18, 2022
@andygrove andygrove self-assigned this Feb 18, 2022
* [spark.rapids.sql.csv.read.float.enabled](configs.md#sql.csv.read.float.enabled)
* [spark.rapids.sql.csv.read.integer.enabled](configs.md#sql.csv.read.integer.enabled)
* [spark.rapids.sql.csv.read.long.enabled](configs.md#sql.csv.read.long.enabled)
* [spark.rapids.sql.csv.read.short.enabled](configs.md#sql.csv.read.short.enabled)
* [spark.rapids.sql.csvTimestamps.enabled](configs.md#sql.csvTimestamps.enabled)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are not specific to this PR and could be PR'd separately

revans2
revans2 previously approved these changes Feb 22, 2022
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good.

@andygrove andygrove changed the title WIP: Initial support for reading decimal types from JSON and CSV Initial support for reading decimal types from JSON and CSV Feb 22, 2022
@andygrove andygrove marked this pull request as ready for review February 22, 2022 20:10
@andygrove
Copy link
Contributor Author

build

@andygrove
Copy link
Contributor Author

@revans2 Could I get another review on this one? There were changes after your last approval, to reduce the amount of regexp in use.

Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two nits related to performance that I can live without.

@andygrove andygrove merged commit 3014afc into NVIDIA:branch-22.04 Mar 7, 2022
@andygrove andygrove deleted the json-csv-decimal-support branch March 7, 2022 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
3 participants