Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update doc to note that single quoted json strings are not ok #2316

Merged

Conversation

sameerz
Copy link
Collaborator

@sameerz sameerz commented Apr 30, 2021

Update doc to note that get_json_object on the GPU only supports double quoted strings in JSON per the http://json.org/ spec. Spark appears to support single quoted strings in JSON per https://github.com/apache/spark/blob/4e8701a77dff729c4e8e0ad39c16e2717c2c32fe/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L108 . Single quote support will be addressed in a future release.

Signed-off-by: Sameer Raheja <sraheja@nvidia.com>
@sameerz sameerz self-assigned this Apr 30, 2021
@sameerz sameerz added the documentation Improvements or additions to documentation label Apr 30, 2021
@sameerz sameerz added this to the Apr 26 - May 7 milestone Apr 30, 2021
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to mention unescaped control characters as well? I assume the cudf json parser does this differently. cc: @nvdbaranec

docs/compatibility.md Outdated Show resolved Hide resolved
Signed-off-by: Sameer Raheja <sraheja@nvidia.com>
docs/compatibility.md Outdated Show resolved Hide resolved
Signed-off-by: Sameer Raheja <sraheja@nvidia.com>
@nvdbaranec
Copy link
Collaborator

Do we need to mention unescaped control characters as well? I assume the cudf json parser does this differently. cc: @nvdbaranec

We don't do any explicit filtering for them at the moment, so they "should" just come through.

Copy link
Collaborator

@nvdbaranec nvdbaranec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth mentioning that this behavior will be fixed in the future?

Signed-off-by: Sameer Raheja <sraheja@nvidia.com>
@sameerz
Copy link
Collaborator Author

sameerz commented Apr 30, 2021

Worth mentioning that this behavior will be fixed in the future?

Updated to mention this will be addressed in the future.

@sameerz
Copy link
Collaborator Author

sameerz commented Apr 30, 2021

build

@nvdbaranec
Copy link
Collaborator

Actually, to add to the confusion a bit. The JSONPath string (the query itself) allows single quotes. In fact the spec for it requires them. It's the input JSON data itself that requires the double quotes. Might be worth a clarification.

Signed-off-by: Sameer Raheja <sraheja@nvidia.com>
@sameerz
Copy link
Collaborator Author

sameerz commented Apr 30, 2021

The JSONPath string (the query itself) allows single quotes. In fact the spec for it requires them. It's the input JSON data itself that requires the double quotes. Might be worth a clarification.

Updated to note that the double quote requirement is for strings in JSON data.

@sameerz
Copy link
Collaborator Author

sameerz commented May 1, 2021

build

@sameerz sameerz merged commit 66eded7 into NVIDIA:branch-0.5 May 1, 2021
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
…#2316)

* Update doc to note that single quoted json strings are not ok

Signed-off-by: Sameer Raheja <sraheja@nvidia.com>

* Correct capitalization of PySpark

Signed-off-by: Sameer Raheja <sraheja@nvidia.com>

* operator -> operation

Signed-off-by: Sameer Raheja <sraheja@nvidia.com>

* Mention the behavior will be updated in the future.

Signed-off-by: Sameer Raheja <sraheja@nvidia.com>

* Note that the double quote requirement is for strings in JSON data

Signed-off-by: Sameer Raheja <sraheja@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
…#2316)

* Update doc to note that single quoted json strings are not ok

Signed-off-by: Sameer Raheja <sraheja@nvidia.com>

* Correct capitalization of PySpark

Signed-off-by: Sameer Raheja <sraheja@nvidia.com>

* operator -> operation

Signed-off-by: Sameer Raheja <sraheja@nvidia.com>

* Mention the behavior will be updated in the future.

Signed-off-by: Sameer Raheja <sraheja@nvidia.com>

* Note that the double quote requirement is for strings in JSON data

Signed-off-by: Sameer Raheja <sraheja@nvidia.com>
@sameerz sameerz deleted the branch-0.5-get_json_object_compatibility branch June 11, 2021 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants