-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-7137 [ML] : Update SchemaUtils checkInputColumn to print more info if needed #5992
Conversation
Pulling functionality from apache spark
@rekhajoshm This was moved to |
hi @mengxr hmm., moving getParam out of Params or to modify its visibility for ${getParam(colName)? Please can you also check with @josephbradley of SPARK-7137 on the PR requirement? |
@mengxr My motivation for creating SPARK-7137 was to print more help info when schema checks fail. Currently, we say something like:
It would be nice to print more info, particularly which PipelineStage caused the problem and which parameter that column is specified by:
|
After discussing with @mengxr it sounds like the best thing to do will be to add an extra argument to checkInputColumn Also, let's not yet modify the uses of checkInputColumn. The reason is that I want to create abstractions for input/output columns, and then those abstractions can handle the schema check internally. I'll create JIRAs for that, to be done after this PR. Thanks! |
thanks @jkbradley @mengxr for discussion and quick inputs.done.please review.thanks. |
@@ -34,10 +34,11 @@ object SchemaUtils { | |||
* @param colName column name | |||
* @param dataType required column data type | |||
*/ | |||
def checkColumnType(schema: StructType, colName: String, dataType: DataType): Unit = { | |||
def checkColumnType(schema: StructType, colName: String, dataType: DataType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Please make sure you follow the style guide linked from [https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark]
If this fits on 1 line (<= 100 chars), then it should. If not, then there should be 1 param per line.
thanks @jkbradley for your review.done.please have a look.thanks. |
ok to test |
Test build #36414 has finished for PR 5992 at commit
|
@@ -34,10 +34,14 @@ object SchemaUtils { | |||
* @param colName column name | |||
* @param dataType required column data type | |||
*/ | |||
def checkColumnType(schema: StructType, colName: String, dataType: DataType): Unit = { | |||
def checkColumnType(schema: StructType, | |||
colName: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow the Spark code style guide and use 4-space indentation for arguments: https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
thanks @mengxr for your comment.done.please have a look.Also raised SPARK-8806, please check if that makes sense?thanks |
Test build #36453 has finished for PR 5992 at commit
|
LGTM merging with master |
thanks @jkbradley @mengxr for the reviews and merge. |
No description provided.