Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve date support in JSON and CSV readers #4853

Merged
merged 26 commits into from
Mar 10, 2022

Conversation

andygrove
Copy link
Contributor

@andygrove andygrove commented Feb 23, 2022

Closes #4849 and #1091 and partially fixes #1111 and #123

Adds support for reading date columns from JSON and CSV, respecting timeParserPolicy.

Status:

  • Basic functionality in place
  • Improve test coverage
  • Update documentation

@andygrove andygrove linked an issue Feb 24, 2022 that may be closed by this pull request
Comment on lines -170 to -177
'spark.rapids.sql.csv.read.bool.enabled': 'true',
'spark.rapids.sql.csv.read.date.enabled': 'true',
'spark.rapids.sql.csv.read.byte.enabled': 'true',
'spark.rapids.sql.csv.read.short.enabled': 'true',
'spark.rapids.sql.csv.read.integer.enabled': 'true',
'spark.rapids.sql.csv.read.long.enabled': 'true',
'spark.rapids.sql.csv.read.float.enabled': 'true',
'spark.rapids.sql.csv.read.double.enabled': 'true',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these options were removed from the configs in earlier PRs related to JSON support

@andygrove andygrove changed the title WIP: Improve date support in JSON and CSV readers Improve date support in JSON and CSV readers Mar 7, 2022
@andygrove andygrove marked this pull request as ready for review March 7, 2022 15:47
@andygrove
Copy link
Contributor Author

build

Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits outside of the documentation needing to be updated.

docs/compatibility.md Outdated Show resolved Hide resolved
docs/compatibility.md Show resolved Hide resolved
@@ -150,7 +150,6 @@ trait SparkQueryCompareTestSuite extends FunSuite with Arm {

def enableCsvConf(): SparkConf = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just delete this?

integration_tests/src/main/python/json_test.py Outdated Show resolved Hide resolved
@revans2
Copy link
Collaborator

revans2 commented Mar 7, 2022

It also looks like some versions of Spark do not include a dateFormat option.

revans2
revans2 previously approved these changes Mar 7, 2022
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nits I have left are small enough I am happy to merge this in as is.

@andygrove
Copy link
Contributor Author

build

@andygrove
Copy link
Contributor Author

build

revans2
revans2 previously approved these changes Mar 7, 2022
@andygrove
Copy link
Contributor Author

andygrove commented Mar 8, 2022

Build failed with Spark 3.3.0

GpuJsonScan.scala:414: value dateFormat is not a member of org.apache.spark.sql.catalyst.json.JSONOptions

I will need to do some shim work next

@andygrove
Copy link
Contributor Author

build

@andygrove
Copy link
Contributor Author

build

revans2
revans2 previously approved these changes Mar 8, 2022
@andygrove
Copy link
Contributor Author

build

@andygrove
Copy link
Contributor Author

@revans2 I fixed the issues with 3.3.0 and tests are now passing

@revans2 revans2 merged commit 2fca312 into NVIDIA:branch-22.04 Mar 10, 2022
@andygrove andygrove deleted the json-date branch March 30, 2022 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
3 participants