-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PYTHON][SQL][WIP] repr(schema) and schema.toString produce runnable code #25495
Conversation
@@ -49,7 +49,7 @@ case class StructField( | |||
} | |||
|
|||
// override the default toString to be compatible with legacy parquet files. | |||
override def toString: String = s"StructField($name,$dataType,$nullable)" | |||
override def toString: String = s"""StructField("$name",$dataType,$nullable)""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scala sides doesn't have repr
contract to make it re-construct-able like Python sides.
Also, I think this can't handle "
character in the middle of its name, for instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the code to confirm this works. The """ allows for the embedded " to work correctly.
True, Scala doesn't have a repr contract for runnable code. However, having toString produce runnable code here has a real use-case for users. Users can inferSchema, get the generated schema code, tweak it as needed, and then provide the schema in the future. I've needed this many times in my projects. I'll make sure I add this to the PR comment. And also open a JIRA.
Can you file a JIRA please? |
Can one of the admins verify this patch? |
We're closing this PR because it hasn't been updated in a while. If you'd like to revive this PR, please reopen it! |
### What changes were proposed in this pull request? These changes update the `__repr__` methods of type classes in `pyspark.sql.types` to print string representations which are `eval`-able. In other words, any instance of a `DataType` will produce a repr which can be passed to `eval()` to create an identical instance. Similar changes previously submitted: #25495 ### Why are the changes needed? This [bug](https://issues.apache.org/jira/browse/SPARK-18621) has been around for a while. The current implementation returns a string representation which is valid in scala rather than python. These changes fix the repr to be valid with python. The [motivation](https://docs.python.org/3/library/functions.html#repr) is "to return a string that would yield an object with the same value when passed to eval()". ### Does this PR introduce _any_ user-facing change? Example: Current implementation: ```python from pyspark.sql.types import * struct = StructType([StructField('f1', StringType(), True)]) repr(struct) # StructType(List(StructField(f1,StringType,true))) new_struct = eval(repr(struct)) # Traceback (most recent call last): # File "<input>", line 1, in <module> # File "<string>", line 1, in <module> # NameError: name 'List' is not defined struct_field = StructField('f1', StringType(), True) repr(struct_field) # StructField(f1,StringType,true) new_struct_field = eval(repr(struct_field)) # Traceback (most recent call last): # File "<input>", line 1, in <module> # File "<string>", line 1, in <module> # NameError: name 'f1' is not defined ``` With changes: ```python from pyspark.sql.types import * struct = StructType([StructField('f1', StringType(), True)]) repr(struct) # StructType([StructField('f1', StringType(), True)]) new_struct = eval(repr(struct)) struct == new_struct # True struct_field = StructField('f1', StringType(), True) repr(struct_field) # StructField('f1', StringType(), True) new_struct_field = eval(repr(struct_field)) struct_field == new_struct_field # True ``` ### How was this patch tested? The changes include a test which asserts that an instance of each type is equal to the `eval` of its `repr`, as in the above example. Closes #34320 from crflynn/sql-types-repr. Lead-authored-by: flynn <crf204@gmail.com> Co-authored-by: Flynn <crflynn@users.noreply.github.com> Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request? These changes update the `__repr__` methods of type classes in `pyspark.sql.types` to print string representations which are `eval`-able. In other words, any instance of a `DataType` will produce a repr which can be passed to `eval()` to create an identical instance. Similar changes previously submitted: #25495 ### Why are the changes needed? This [bug](https://issues.apache.org/jira/browse/SPARK-18621) has been around for a while. The current implementation returns a string representation which is valid in scala rather than python. These changes fix the repr to be valid with python. The [motivation](https://docs.python.org/3/library/functions.html#repr) is "to return a string that would yield an object with the same value when passed to eval()". ### Does this PR introduce _any_ user-facing change? Example: Current implementation: ```python from pyspark.sql.types import * struct = StructType([StructField('f1', StringType(), True)]) repr(struct) # StructType(List(StructField(f1,StringType,true))) new_struct = eval(repr(struct)) # Traceback (most recent call last): # File "<input>", line 1, in <module> # File "<string>", line 1, in <module> # NameError: name 'List' is not defined struct_field = StructField('f1', StringType(), True) repr(struct_field) # StructField(f1,StringType,true) new_struct_field = eval(repr(struct_field)) # Traceback (most recent call last): # File "<input>", line 1, in <module> # File "<string>", line 1, in <module> # NameError: name 'f1' is not defined ``` With changes: ```python from pyspark.sql.types import * struct = StructType([StructField('f1', StringType(), True)]) repr(struct) # StructType([StructField('f1', StringType(), True)]) new_struct = eval(repr(struct)) struct == new_struct # True struct_field = StructField('f1', StringType(), True) repr(struct_field) # StructField('f1', StringType(), True) new_struct_field = eval(repr(struct_field)) struct_field == new_struct_field # True ``` ### How was this patch tested? The changes include a test which asserts that an instance of each type is equal to the `eval` of its `repr`, as in the above example. Closes #34320 from crflynn/sql-types-repr. Lead-authored-by: flynn <crf204@gmail.com> Co-authored-by: Flynn <crflynn@users.noreply.github.com> Signed-off-by: Sean Owen <srowen@gmail.com> (cherry picked from commit c5ebdc6) Signed-off-by: Sean Owen <srowen@gmail.com>
What changes were proposed in this pull request?
repr(schema) produces runnable python code
schema.toString produce runnable scala code
Why are the changes needed?
Previously, schema.toString produced scala code that wasn't runnable because field-names weren't quoted. Even worse, repr(schema) in python produced the same non-runnable scala code. This resolves both issues, so that runnable Scala and Python are available.
Does this PR introduce any user-facing change?
Yes, see above.
How was this patch tested?
pyspark/sql/tests/test_types.py now has test_repr()