Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for other languages for rouge #108

Open
alexyalunin opened this issue Nov 15, 2020 · 2 comments
Open

Add support for other languages for rouge #108

alexyalunin opened this issue Nov 15, 2020 · 2 comments

Comments

@alexyalunin
Copy link

alexyalunin commented Nov 15, 2020

I calculate rouge with

from datasets import load_metric
rouge = load_metric("rouge")
rouge_output = rouge.compute(predictions=['тест тест привет'], references=['тест тест пока'], rouge_types=[
    "rouge2"])["rouge2"].mid
print(rouge_output)

the result is
Score(precision=0.0, recall=0.0, fmeasure=0.0)
It seems like the rouge_score library that this metric uses filters all non-alphanueric latin characters
in rouge_scorer/tokenize.py with text = re.sub(r"[^a-z0-9]+", " ", six.ensure_str(text)).
Please add support for other languages.

@m3hrdadfi
Copy link

@alexyalunin

I did something similar for others languages.

Repo: rouge-metric

@mariosasko mariosasko transferred this issue from huggingface/datasets Jun 2, 2022
@lvwerra
Copy link
Member

lvwerra commented Jun 14, 2022

We could update ROUGE similiar to BLEU (#19) where optionally a tokenizer can be passed. The RougeScorer accepts a tokenizer keyword argument.

cc @sashavor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants