feat: Implement `cross_validate` functionality #443

augustebaum · 2024-10-04T10:12:08Z

Introduces a new item type, CrossValidateItem, as well as a top-level function skore.cross_validate, which detects the type of ML task based on the input estimator and target, adds some scorers, and runs scikit-learn's cross_validate with those extra arguments.
The results are stored in the Project given as input, always under key cross_validation, along with an interactive summary plot.
The output of skore.cross_validate is as close as possible to scikit-learn's, except for some edge cases. Also, if the user passes scorer="my_metric", the output dict will contain key test_my_metric as well as test_score, whereas in the scikit-learn version the dict only has test_score.

TL;DR:

Make Altair a mandatory dependency
Add CrossValidateItem
Add cross_validate function
Add plot_cross_validation function
Refactor tests (add fixture in_memory_project and use that instead of project for clarity -- it would have been great to do from conftest import in_memory_project as project but pytest doesn't work that way)

How to test:

I'm in a notebook/VSCode, I have a sklearn estimator, I run skore.cross_validate on it just like I would with sklearn.model_selection.cross_validate, except I also give the argument project=project to let Skore know where to store things (if the argument is not passed we'll do cross-validation but nothing will be saved)
A plot appears in the UI and in my notebook
I forgot a metric; I change the cell and re-run
A new plot is outputted and the one on the UI updates

Addresses the first part of #383

To-do:

skore/src/skore/item/cross_validate_item.py

examples/basic_usage.ipynb

examples/basic_usage.py

skore/src/skore/cross_validate.py

MarieS-WiMLDS · 2024-10-09T13:28:09Z

hi!
When I did make install-skore to test this PR, some files were created in the frontend folder. Git wants to follow these files. If they do not exist already in the folder, I supposed it should be in gitignore?

// EDIT: might have been some old stuff. Fixed.

thomass-dev

Solid work, but needs a few adjustments IMHO.
Please read all my comments at once. Thanks!

skore/src/skore/cross_validate.py

skore/src/skore/item/cross_validate_item.py

skore/src/skore/cross_validate.py

skore/src/skore/ui/project_routes.py

skore/src/skore/item/cross_validate_item.py

skore/src/skore/cross_validate.py

MarieS-WiMLDS · 2024-10-09T15:46:38Z

At first I tried this (cf screenshot). I understood after that I had to do what is commented.

Is there a technical constraint to prevent us from having the cross_validate function at the root of the lib? would it be bad practice?

augustebaum · 2024-10-09T16:17:22Z

No constraint, it's just a matter of design. I'll make the shortcut available now

MarieS-WiMLDS · 2024-10-09T17:47:37Z

It's super cool to see this live 🤩

I tested:

creating an error in scikit-learn --> the error message is passed correctly!
a regression problem (with diabetes dataset & lasso)
a multi-class classification problem (with iris & random forest)
a binary classification problem
a classification problem

Questions and remarks:

Can we drop score_time? I don't find this metric very interesting, and I have the feeling that it overloads the options
What is the test_score?
The plot cross_validate in skore ui doesn't update
in the classif multiclass. I hadn't think of this usecase before actually. The default values for binary and multi-class shouldn't be the same.
a. the default I chose aren't correct, it creates an error. let's create an additional elif, and instead of recall and precision, use recall_weighted and precision_weighted.
b. Because the default aren't correct, there was an error in the process. Yet, some part of cross_validate in scikit learn side ran to the end, and we have some results. Is it normal not to have at least these results displayed? (cf screen shot below).
Again, for classification, the default are not correct on my side. The silhouette score exists in scikit-learn, but you have to pass it to cross_validate through a callable, and not a string. I'm surprised, so I would tend to say that we shouldn't use it as default. @sylvaincom what do you think about this? (I asked also in OS chan in slack).

EDIT: based on Gaël's and Guillaume's feedback, let's remove all default score for the clustering, because there is no "usual" metric. Cross_validate doesn't have much sense for clustering anyway.

augustebaum · 2024-10-10T08:27:55Z

I tested:

creating an error in scikit-learn --> the error message is passed correctly!

a regression problem (with diabetes dataset & lasso)

a multi-class classification problem (with iris & random forest)

a binary classification problem

a classification problem

That's really great, thanks! It would be really useful to add these use-cases in our tests directly, let's sync to do this.

Can we drop score_time?

It depends if we want to uphold the requirement that our cross_validate returns the same as scikit-learn's. IMO it makes sense to follow the principle of least surprise, so keep it.

What is the test_score?

If you don't pass any scoring parameter (or if you pass a single metric, like scoring="r2"), this metric is the one shown in test_score. If you don't pass scoring, in many cases scikit-learn can make a guess at what metric to use.

The plot cross_validate in skore ui doesn't update

Will investigate.

Because the default aren't correct, there was an error in the process (cf screen shot below).

This is not normal; let's sync to reproduce this.

Let's remove all default scores for clustering

Got it!

augustebaum · 2024-10-10T09:36:11Z

It looks like neg_brier_score requires the estimator to have a predict_proba method. I guess we can check the estimator beforehand and remove this scorer if don't have that method.

augustebaum · 2024-10-10T09:39:11Z

@MarieS-WiMLDS The scorer recall_weighted does not exist; did you mean recall_score(average="weighted")?

augustebaum · 2024-10-10T09:51:55Z

@MarieS-WiMLDS What should be the output of project.get("cross_validate")? For now, it outputs something like what scikit-learn outputs ({"test_score": ..., ...}), but this doesn't work when the user e.g. sets return_estimator=True, because skore is not meant to store anything and everything. An option could be to return the altair plot, another could be something like the scikit-learn outputs but without the non-serializable things...

MarieS-WiMLDS · 2024-10-10T13:40:58Z

For multi-class classification, the following metrics should be used to be compliant:

roc_auc_ovr_weighted
recall_weighted
precision_weighted
neg_log_loss

skore/src/skore/item/cross_validate_item.py

"test_score"

thomass-dev

Pair reviewed with @rouk1 .

skore/src/skore/item/cross_validate_item.py

skore/src/skore/cross_validate.py

thomass-dev · 2024-10-15T11:46:37Z

skore/src/skore/cross_validate.py

+
+    Returns
+    -------
+    new_scorers : dict[str, str | None]


Please remove this typing which seems not valid.

thomass-dev · 2024-10-15T11:47:39Z

skore/src/skore/cross_validate.py

+
+def _add_scorers(estimator, y, scorers):
+    """Expand `scorers` with other scorers, based on `estimator` and `y`.
+


Can you had a comment expliciting that the shape of scorers is important.

thomass-dev · 2024-10-15T11:53:09Z

skore/tests/integration/test_cross_validate.py

Please add unittests to the cross_validate_item too.

thomass-dev · 2024-10-15T11:57:37Z

skore/tests/integration/test_cross_validate.py

+
+def test_cross_validate_2_extra_metrics(in_memory_project, lasso):
+    args = list(lasso)
+    kwargs = {"scoring": ["r2", "neg_mean_squared_error"], "cv": 3}


Please add a test for each type of score available:

If scoring represents multiple scores, one can use:
a list or tuple of unique strings;
a callable returning a dictionary where the keys are the metric names and the values are the metric scores;
a dictionary with metric names as keys and callables a values.

thomass-dev · 2024-10-15T11:58:07Z

skore/src/skore/cross_validate.py

+    return "classification"
+
+
+def _add_scorers(estimator, y, scorers):


Please add a dedicated test.

github-actions bot assigned augustebaum Oct 4, 2024

augustebaum force-pushed the cross-validate-item branch 14 times, most recently from 1a1eeff to 32e26fd Compare October 9, 2024 10:24

augustebaum marked this pull request as ready for review October 9, 2024 10:28

augustebaum requested review from rouk1 and thomass-dev October 9, 2024 10:28

tuscland requested changes Oct 9, 2024

View reviewed changes

thomass-dev requested changes Oct 9, 2024

View reviewed changes

rouk1 reviewed Oct 11, 2024

View reviewed changes

skore/src/skore/item/cross_validate_item.py Show resolved Hide resolved

augustebaum added 22 commits October 15, 2024 10:16

adjust cv plot dimensions to fit y axis labels with more certainty

37c017c

remove logging in case display is not found

9376fc9

update tests

9ff7bf1

update basic_usage example

ee207eb

address cam's feedback

47f2b44

Update docstrings and type-hints

5f9e55a

Make from skore import cross_validate possible

93eb731

replace try-except with contextlib.suppress

6e9060e

remove added scorers for clustering

f807be6

use else when adding scorers

4a4c80c

Replace CrossValidationItem.plot with a bytes attribute and a property

cd4e1ed

properly refresh VegaWidget when underlying chart changed

3adcada

refactor _expand_scorers to _add_scorers

e487a8b

Make it so metric name is always included in cv_results, alongside

60d4de3

"test_score"

Store estimator params as a string in CrossValidationItem

31c6929

fix bug with plotting when cv_results have keys indices or estimator

56d9140

make cv_results attribute of CrossValidationItem primitive

969468e

[squashme] wip

4875745

Add Marie's tests

9ac246a

add multi-class classification case

3bbe03e

Deal with non-numpy cases when instantiating CrossValidationItem

67c69ac

Make PR compatible with Python>=3.9

f930a10

thomass-dev requested changes Oct 15, 2024

View reviewed changes

thomass-dev marked this pull request as draft October 15, 2024 11:59

augustebaum force-pushed the cross-validate-item branch from 9eafcee to f930a10 Compare October 15, 2024 12:01

augustebaum added 5 commits October 15, 2024 14:10

make plot a cached property

5efd10c

don't use MediaItem to convert plot to bytes

75d6764

cache plot at initialization

d3f9729

remove property cv_results, use cv_results_serialized directly

a199749

clean docstring

76dce8d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Implement `cross_validate` functionality #443

feat: Implement `cross_validate` functionality #443

augustebaum commented Oct 4, 2024 •

edited

Loading

MarieS-WiMLDS commented Oct 9, 2024 •

edited

Loading

thomass-dev left a comment

MarieS-WiMLDS commented Oct 9, 2024

augustebaum commented Oct 9, 2024 •

edited

Loading

MarieS-WiMLDS commented Oct 9, 2024 •

edited

Loading

augustebaum commented Oct 10, 2024

augustebaum commented Oct 10, 2024

augustebaum commented Oct 10, 2024

augustebaum commented Oct 10, 2024

MarieS-WiMLDS commented Oct 10, 2024

thomass-dev left a comment

thomass-dev Oct 15, 2024

thomass-dev Oct 15, 2024

thomass-dev Oct 15, 2024

thomass-dev Oct 15, 2024

thomass-dev Oct 15, 2024


		def _add_scorers(estimator, y, scorers):
		"""Expand `scorers` with other scorers, based on `estimator` and `y`.

		return "classification"


		def _add_scorers(estimator, y, scorers):

feat: Implement cross_validate functionality #443

Are you sure you want to change the base?

feat: Implement cross_validate functionality #443

Conversation

augustebaum commented Oct 4, 2024 • edited Loading

MarieS-WiMLDS commented Oct 9, 2024 • edited Loading

thomass-dev left a comment

Choose a reason for hiding this comment

MarieS-WiMLDS commented Oct 9, 2024

augustebaum commented Oct 9, 2024 • edited Loading

MarieS-WiMLDS commented Oct 9, 2024 • edited Loading

augustebaum commented Oct 10, 2024

augustebaum commented Oct 10, 2024

augustebaum commented Oct 10, 2024

augustebaum commented Oct 10, 2024

MarieS-WiMLDS commented Oct 10, 2024

thomass-dev left a comment

Choose a reason for hiding this comment

thomass-dev Oct 15, 2024

Choose a reason for hiding this comment

thomass-dev Oct 15, 2024

Choose a reason for hiding this comment

thomass-dev Oct 15, 2024

Choose a reason for hiding this comment

thomass-dev Oct 15, 2024

Choose a reason for hiding this comment

thomass-dev Oct 15, 2024

Choose a reason for hiding this comment

feat: Implement `cross_validate` functionality #443

feat: Implement `cross_validate` functionality #443

augustebaum commented Oct 4, 2024 •

edited

Loading

MarieS-WiMLDS commented Oct 9, 2024 •

edited

Loading

augustebaum commented Oct 9, 2024 •

edited

Loading

MarieS-WiMLDS commented Oct 9, 2024 •

edited

Loading