Rouge score accuracy #2

pltrdy · 2017-03-22T16:05:07Z

The results are known to be quite different from official ROUGE scoring script.

It has been discussed here:
google/seq2seq#89

pltrdy · 2018-02-16T18:32:23Z

It has been improved with #6

I compared two scoring, on multi-sentence files with 10397 lines, 508630 words, and i get:

Official ROUGE(using files2rouge): (took 111 seconds)

---------------------------------------------
1 ROUGE-1 Average_R: 0.34882 (95%-conf.int. 0.34632 - 0.35132)
1 ROUGE-1 Average_P: 0.40104 (95%-conf.int. 0.39803 - 0.40391)
1 ROUGE-1 Average_F: 0.36161 (95%-conf.int. 0.35934 - 0.36383)
---------------------------------------------
1 ROUGE-2 Average_R: 0.13938 (95%-conf.int. 0.13718 - 0.14151)
1 ROUGE-2 Average_P: 0.16228 (95%-conf.int. 0.15968 - 0.16490)
1 ROUGE-2 Average_F: 0.14511 (95%-conf.int. 0.14293 - 0.14729)
---------------------------------------------
1 ROUGE-L Average_R: 0.32234 (95%-conf.int. 0.31998 - 0.32478)
1 ROUGE-L Average_P: 0.37093 (95%-conf.int. 0.36804 - 0.37374)
1 ROUGE-L Average_F: 0.33429 (95%-conf.int. 0.33208 - 0.33647)

this code: (took 20 seconds)

{
  "rouge-1": {
    "f": 0.3672435871687543,
    "p": 0.40349020487306564,
    "r": 0.3527286721707171
  },
  "rouge-2": {
    "f": 0.14396864450679678,
    "p": 0.16098625779779233,
    "r": 0.13821563233163145
  },
  "rouge-l": {
    "f": 0.32548307280858685,
    "p": 0.3741943564047806,
    "r": 0.32687448001488595
  }
}

shijx12 · 2018-07-04T13:42:33Z

Maybe the difference is caused by

rouge/rouge/rouge.py

Line 92 in 8255cac

hyp = [" ".join(_.split()) for _ in hyp.split(".") if len(_) > 0]

split by '.' will remove all '.' in hyp and ref.

pltrdy · 2018-07-04T14:42:47Z

@shijx12 It's not the only reason, but you've got a good point, that code does not make sense.

I'm editing it and evaluating the impact. Thanks for pointing this out.

Diego999 · 2018-08-07T08:06:11Z

Hi @pltrdy ,

Could you run some evaluation to compare the differences between the perl script and yours ? How much does it differ ? I would love to get rid off the perl script ! https://github.com/RxNLP/ROUGE-2.0 seems to have identical scores (besides a +1 as smoothing they did not implement because not indication was present in the official ROUGE script)

pltrdy · 2018-08-07T09:24:47Z

@Diego999 that's precisely what I did here: #2 (comment).
In addition, results may slightly differ because of how end of sentences are handled, as suggested in #2 (comment).

Diego999 · 2018-08-07T09:43:49Z

@pltrdy yes but that was in February, some modifications have been done since ;) Especially the remark of #2 (comment) . Did you re-conduct experiments since ?

pltrdy · 2018-08-07T10:37:32Z

It must be similar if not exactly the same. I'm not sure how is the punctuation handled in the official script. I've attempted some fixes which seems to be worse. It may just be ignored, therefore naïve implementation may be the right one.

Diego999 · 2018-08-07T10:38:08Z

Ok, thank you for your answer !

AlJohri · 2018-08-09T03:00:42Z

@Diego999

seems to have identical scores

is it documented somewhere showing that ROUGE-2.0 has identical scores?

Diego999 · 2018-08-09T07:19:15Z

@AlJohri Yes last paragraph of their paper

Diego999 · 2018-09-19T12:20:49Z

By the way, I solved this problem here: https://github.com/Diego999/py-rouge Have a look at the README to understand when the results are different at ~4e-5 sometime

AlJohri · 2018-09-19T21:38:32Z

that's great to hear @Diego999! are you planning on releasing this as an independent package or merging it back into pltrdy/rouge?

Diego999 · 2018-09-20T04:03:16Z

It’s already done :)

…

-- Diego Antognini From: Al Johri <notifications@github.com> <notifications@github.com> Reply: pltrdy/rouge <reply@reply.github.com> <reply@reply.github.com> Date: 19 September 2018 at 23:38:33 To: pltrdy/rouge <rouge@noreply.github.com> <rouge@noreply.github.com> CC: Diego Antognini <diegoantognini@gmail.com> <diegoantognini@gmail.com>, Mention <mention@noreply.github.com> <mention@noreply.github.com> Subject: Re: [pltrdy/rouge] Rouge score accuracy (#2) that's great to hear @Diego999 <https://github.com/Diego999>! are you

planning on releasing this as an independent package or merging it back into @pltrdy/rouge? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABCrcNTRMkHmTKtHrSOkyJ2eGwIdeXZjks5ucrlZgaJpZM4MlcA5> .

pltrdy added the help wanted label Mar 22, 2017

pltrdy mentioned this issue Feb 16, 2018

Major update for multi-sentences sequences ROUGE-L #6

Merged

pltrdy added discussion and removed help wanted labels Feb 16, 2018

AlJohri mentioned this issue Aug 20, 2018

replace pythonrouge with rouge pasmod/paradox#35

Merged

pltrdy mentioned this issue Nov 7, 2018

Does this code correct right now? #22

Closed

pltrdy mentioned this issue Dec 3, 2019

Lengths of Reference and System Summaries #37

Closed

pltrdy closed this as completed Jan 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rouge score accuracy #2

Rouge score accuracy #2

pltrdy commented Mar 22, 2017

pltrdy commented Feb 16, 2018

shijx12 commented Jul 4, 2018

pltrdy commented Jul 4, 2018

Diego999 commented Aug 7, 2018

pltrdy commented Aug 7, 2018

Diego999 commented Aug 7, 2018 •

edited

Loading

pltrdy commented Aug 7, 2018

Diego999 commented Aug 7, 2018

AlJohri commented Aug 9, 2018

Diego999 commented Aug 9, 2018 •

edited

Loading

Diego999 commented Sep 19, 2018

AlJohri commented Sep 19, 2018 •

edited

Loading

Diego999 commented Sep 20, 2018 via email

Rouge score accuracy #2

Rouge score accuracy #2

Comments

pltrdy commented Mar 22, 2017

pltrdy commented Feb 16, 2018

shijx12 commented Jul 4, 2018

pltrdy commented Jul 4, 2018

Diego999 commented Aug 7, 2018

pltrdy commented Aug 7, 2018

Diego999 commented Aug 7, 2018 • edited Loading

pltrdy commented Aug 7, 2018

Diego999 commented Aug 7, 2018

AlJohri commented Aug 9, 2018

Diego999 commented Aug 9, 2018 • edited Loading

Diego999 commented Sep 19, 2018

AlJohri commented Sep 19, 2018 • edited Loading

Diego999 commented Sep 20, 2018 via email

Diego999 commented Aug 7, 2018 •

edited

Loading

Diego999 commented Aug 9, 2018 •

edited

Loading

AlJohri commented Sep 19, 2018 •

edited

Loading