Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-7137 [ML] : Update SchemaUtils checkInputColumn to print more info if needed #5992

Closed
wants to merge 5 commits into from

Conversation

rekhajoshm
Copy link
Contributor

No description provided.

@mengxr
Copy link
Contributor

mengxr commented May 11, 2015

@rekhajoshm This was moved to ml.util.SchemaUtils. If you want to output more information, please modify it there.

@rekhajoshm
Copy link
Contributor Author

hi @mengxr hmm., moving getParam out of Params or to modify its visibility for ${getParam(colName)? Please can you also check with @josephbradley of SPARK-7137 on the PR requirement?

@jkbradley
Copy link
Member

@mengxr My motivation for creating SPARK-7137 was to print more help info when schema checks fail. Currently, we say something like:

Column features must be of type StringType but was actually DoubleType.

It would be nice to print more info, particularly which PipelineStage caused the problem and which parameter that column is specified by:

... This column is specified via parameter "inputCol" from "lr_32983" with description: ...

@jkbradley
Copy link
Member

After discussing with @mengxr it sounds like the best thing to do will be to add an extra argument to checkInputColumn msg: String, with default value empty string. Then we can pass the extra info as needed. Could you please modify this PR so it does not move the method, but instead adds the extra argument? Part of our motivation is that we eventually want these SchemaUtils to be part of DataFrames, rather than MLlib.

Also, let's not yet modify the uses of checkInputColumn. The reason is that I want to create abstractions for input/output columns, and then those abstractions can handle the schema check internally. I'll create JIRAs for that, to be done after this PR.

Thanks!

@rekhajoshm rekhajoshm changed the title SPARK-7137 [ML] : Add checkInputColumn back to Params and print more info SPARK-7137 [ML] : Update SchemaUtils checkInputColumn to print more info if needed Jul 1, 2015
@rekhajoshm
Copy link
Contributor Author

thanks @jkbradley @mengxr for discussion and quick inputs.done.please review.thanks.

@@ -34,10 +34,11 @@ object SchemaUtils {
* @param colName column name
* @param dataType required column data type
*/
def checkColumnType(schema: StructType, colName: String, dataType: DataType): Unit = {
def checkColumnType(schema: StructType, colName: String, dataType: DataType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Please make sure you follow the style guide linked from [https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark]

If this fits on 1 line (<= 100 chars), then it should. If not, then there should be 1 param per line.

@rekhajoshm
Copy link
Contributor Author

thanks @jkbradley for your review.done.please have a look.thanks.

@mengxr
Copy link
Contributor

mengxr commented Jul 2, 2015

ok to test

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36414 has finished for PR 5992 at commit 33ddd2e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -34,10 +34,14 @@ object SchemaUtils {
* @param colName column name
* @param dataType required column data type
*/
def checkColumnType(schema: StructType, colName: String, dataType: DataType): Unit = {
def checkColumnType(schema: StructType,
colName: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the Spark code style guide and use 4-space indentation for arguments: https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide

@rekhajoshm
Copy link
Contributor Author

thanks @mengxr for your comment.done.please have a look.Also raised SPARK-8806, please check if that makes sense?thanks

@SparkQA
Copy link

SparkQA commented Jul 3, 2015

Test build #36453 has finished for PR 5992 at commit 8c42b57.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

LGTM merging with master
Thanks!

@asfgit asfgit closed this in f9c448d Jul 5, 2015
@rekhajoshm
Copy link
Contributor Author

thanks @jkbradley @mengxr for the reviews and merge.

@rekhajoshm rekhajoshm deleted the fix/SPARK-7137 branch June 21, 2018 06:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants