Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8468][ML] Take the negative of some metrics in RegressionEvaluator to get correct cross validation #6905

Closed
wants to merge 5 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Jun 19, 2015

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35266 has finished for PR 6905 at commit b5f52c1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jun 19, 2015

You would at least have to change the docs in Evaluator to say that evaluate no longer necessarily returns larger values for better evaluations.

I don't see that you changed RegressionEvaluator to flip the logic though?

There's more of a problem with that though. 3 of the 4 metrics in RegressionEvaluator are worse when larger: RMSE, MSE, MAE. But R^2 is not. So this would still have a similar bug.

Another possibility is to invert the result of RMSE, MSE, MAE. For eval purposes, their relative ranking is all that matters so returning 1/x as the evaluation criteria is fine, for example. That would let you fully fix this without any API change.

@viirya
Copy link
Member Author

viirya commented Jun 19, 2015

Thanks. I didn't notice there is R^2 that is needed to maximize. The simplest solution should be to invert the result of RMSE, MSE, MAE.

@viirya viirya changed the title [SPARK-8468][ML] Add param to CrossValidator for choosing whether to maximize evaulation value [SPARK-8468][ML] Invert some metrics of RegressionEvaluator to get correct cross validation Jun 19, 2015
@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35278 has finished for PR 6905 at commit c3dd8d9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Jun 19, 2015

@srowen @viirya I think using - would be better than 1 /, at least it is easier to see the absolute value.

@viirya
Copy link
Member Author

viirya commented Jun 19, 2015

@mengxr agreed. Updated.

@mengxr
Copy link
Contributor

mengxr commented Jun 19, 2015

LGTM pending Jenkins. Thanks for the quick fix and @chelseaz for reporting the bug!

@viirya viirya changed the title [SPARK-8468][ML] Invert some metrics of RegressionEvaluator to get correct cross validation [SPARK-8468][ML] Take the negative of some metrics in RegressionEvaluator to get correct cross validation Jun 19, 2015
@chelseaz
Copy link

Thanks for the fix @viirya!

@srowen
Copy link
Member

srowen commented Jun 19, 2015

Arguing with myself: I see the value of the eval is used in some log statements including its mean. This may be a little less than useful when it is some inverse RMSE but still probably a decent solution

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35285 has finished for PR 6905 at commit 16e3b2c.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Jun 19, 2015

test this please

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35311 has finished for PR 6905 at commit 16e3b2c.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Jun 19, 2015

@viirya Could you fix the python unit tests?

**********************************************************************
File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/ml/evaluation.py", line 162, in __main__.RegressionEvaluator
Failed example:
    evaluator.evaluate(dataset)
Expected:
    2.842...
Got:
    -2.8424854081906372
**********************************************************************
File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/ml/evaluation.py", line 166, in __main__.RegressionEvaluator
Failed example:
    evaluator.evaluate(dataset, {evaluator.metricName: "mae"})
Expected:
    2.649...
Got:
    -2.6496454299999996
**********************************************************************

@jkbradley
Copy link
Member

@viirya Could you also please document the meaning of the various metrics in RegressionEvaluator? Users will be surprised that those values are negated. The doc can go in the "metricName" Param Scala/Python doc. Thanks!

@SparkQA
Copy link

SparkQA commented Jun 20, 2015

Test build #35351 has finished for PR 6905 at commit 930d3db.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StreamingKMeansModel(KMeansModel):
    • class StreamingKMeans(object):

@jkbradley
Copy link
Member

LGTM merging into branch-1.4 and master
Thanks!

asfgit pushed a commit that referenced this pull request Jun 20, 2015
…uator to get correct cross validation

JIRA: https://issues.apache.org/jira/browse/SPARK-8468

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #6905 from viirya/cv_min and squashes the following commits:

930d3db [Liang-Chi Hsieh] Fix python unit test and add document.
d632135 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into cv_min
16e3b2c [Liang-Chi Hsieh] Take the negative instead of reciprocal.
c3dd8d9 [Liang-Chi Hsieh] For comments.
b5f52c1 [Liang-Chi Hsieh] Add param to CrossValidator for choosing whether to maximize evaulation value.

(cherry picked from commit 0b89951)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
@asfgit asfgit closed this in 0b89951 Jun 20, 2015
asfgit pushed a commit that referenced this pull request Jun 22, 2015
…lidatorSuite

Ref. #6905
ping yhuai

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #6929 from viirya/hot_fix_cv_test and squashes the following commits:

b1aec53 [Liang-Chi Hsieh] Hotfix branch-1.4 by removing avgMetrics in CrossValidatorSuite.
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 22, 2015
…uator to get correct cross validation

JIRA: https://issues.apache.org/jira/browse/SPARK-8468

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes apache#6905 from viirya/cv_min and squashes the following commits:

930d3db [Liang-Chi Hsieh] Fix python unit test and add document.
d632135 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into cv_min
16e3b2c [Liang-Chi Hsieh] Take the negative instead of reciprocal.
c3dd8d9 [Liang-Chi Hsieh] For comments.
b5f52c1 [Liang-Chi Hsieh] Add param to CrossValidator for choosing whether to maximize evaulation value.

(cherry picked from commit 0b89951)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 22, 2015
…lidatorSuite

Ref. apache#6905
ping yhuai

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes apache#6929 from viirya/hot_fix_cv_test and squashes the following commits:

b1aec53 [Liang-Chi Hsieh] Hotfix branch-1.4 by removing avgMetrics in CrossValidatorSuite.
@viirya viirya deleted the cv_min branch December 27, 2023 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants