Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: integration standard libraries #11

Open
6 tasks
lvwerra opened this issue Apr 6, 2022 · 6 comments
Open
6 tasks

Feature: integration standard libraries #11

lvwerra opened this issue Apr 6, 2022 · 6 comments
Labels
enhancement New feature or request

Comments

@lvwerra
Copy link
Member

lvwerra commented Apr 6, 2022

Create a uniform integration for metrics libraries such as scikit-learn, keras.metrics or torchmetrics instead of integrating them metrics from these frameworks one-by-one. The goal of the integration is to be able to load the metrics from these frameworks with a syntax as follows:

accuracy = load_metric("scikit-learn/accuracy")

This way there is also more transparency where the metrics come from especially if they are just a wrapper around existing libraries. This would also simplify maintaining them by avoiding the need to change every metric separately if needed.

Popular metrics libraries:

Others (maybe a bit more niche):

If all metrics (incl. "canonical" ones) will be on the hub such an integration could just be a script that automatically creates (or updates) the necessary repositories on the hub.

@lvwerra lvwerra added documentation Improvements or additions to documentation enhancement New feature or request and removed documentation Improvements or additions to documentation labels Apr 6, 2022
@sashavor
Copy link
Contributor

We could also connect to GEM as per @yjernite 's proposal in another thread.

@sashavor
Copy link
Contributor

sashavor commented Apr 19, 2022

And to Skimage

@sashavor
Copy link
Contributor

And to Torch Fidelity -- for generative metrics

@thomwolf
Copy link
Member

thomwolf commented Apr 26, 2022

I would rather reimplement these metrics than depend on other libs.

My heuristic:

  • if the metric is something stable we don't expect to change (Jaccard score): I would reimplement. Note that if the license of the other lib is permissive (apache 2 ou MIT) you can copy with copyright citation and it's fine
  • if the metric is expected to change, e.g. BertScore or other metrics that either very recent or based on pretrained neural net: I would rather depend on an external lib to avoid having to keep our metric in sync

@lvwerra
Copy link
Member Author

lvwerra commented Apr 26, 2022

So this would be quite a deviation from the current status of metrics: most of them are wrappers around existing libraies. Just to name a few:

  • "accuracy", "recall", "f1" are all wrappers around scikit-learn
  • "meteor" uses nltk
  • "seqeval" is a wrapper for seqeval
  • "spearmanr" comes from scipy

Would you reimplement all of them? As an example, there are O(100) metrics in scikit-learn and we only have a handful of them inside evaluate at the moment. scikit-learn has BSD licence which would also mean that we would need a BSD license as well if we copy-paste the code, right?

My main concern is that we can't move very fast and adopt a wide range of modalities if we implement most of them ourselves. I feel like a criteria for adoption is how fast we cover a wide range of metrics which avoids a user to leave the library for a missing one. On the other hand with community metrics users would probably quickly add missing metrics by creating wrappers since that's the fastest way (which is probably what kind of happened in datasets/metrics.

Regarding dependencies: the idea was that each metric's repository has a requirement.txt file, so we don't need to add the dependencies to evaluate but being able to easily guide the user to update the installation.

What do you think about the following:

  • Phase 1: allow 3rd party integrations slowly wrappers from evaluate/metrics
  • Phase 2: add proper implementations to evaluate/metrics where there is a need

@johnnv1
Copy link

johnnv1 commented Jun 16, 2022

Just giving my half-cent opinion here… if you are going to rewrite the metrics, it could be done in a way that you compute the metrics from a confusion matrix. I know this could be done for metrics used in computer vision, I don't know if the same would be true for NLP.

But as already noted, this would probably take more time than third-party integrations 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants