CER Normalization #161

Coliverfelt · 2022-06-24T15:56:03Z

I'm not finding in the documentation a method to normalize CER metric. Do you have this implemented?

cer.features
OUTPUT:
{'predictions': Value(dtype='string', id='sequence'),
 'references': Value(dtype='string', id='sequence')}

df[df['cer'] == 1.25]
OUTPUT:
GT | OCR_Prediction | wer | cer
2.49 | 749.00 | 1.0 | 1.25
2.49 | 749.00 | 1.0 | 1.25
2.50 | 1950.00 | 1.0 | 1.25

The text was updated successfully, but these errors were encountered:

lvwerra · 2022-06-24T16:58:18Z

Hi @Coliverfelt, not sure what you mean. You can find the implementation details here:
https://huggingface.co/spaces/evaluate-metric/cer/blob/main/cer.py

Coliverfelt · 2022-06-27T03:33:40Z

Hi @lvwerra! My point is that when CER metric is above 1.0, like in this example:

reference: 2.49
prediction: 749.00
cer: 1.25

substitute= 1
delete = 1
insert = 3
hits = 2
reference length = 4

CER formula is substitute + delete + insert / reference length and the result is equal to 1.25.

But in this case that CER is greater than 1.00 we should apply normalization. So the normalized result should be:

CER normalization formula is substitute + delete + insert / substitute + delete + insert + hits so the result is equal to 0.7142857142857143.

And when I'm running:

cer.compute(predictions=[['2.49'], references=[['749.00']]

The result isn't normalized.

Did I make myself clearer?

lvwerra · 2022-07-07T09:21:39Z

I couldn't find any reference for normalized CER do you have a link? We could add this as an optional config. E.g.

cer.compute(references=references, predictions=predictions, normalized=True)

What do you think?

Coliverfelt · 2022-07-26T11:51:47Z

I think this way could work well!
I'm fowarding the link with the CER normalization content and a image with it's explanation.
https://towardsdatascience.com/evaluating-ocr-output-quality-with-character-error-rate-cer-and-word-error-rate-wer-853175297510#5aec

lvwerra · 2022-08-03T14:35:07Z

Sounds good, do you want to take a stab at it?

lvwerra added the Metric discussion label Aug 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CER Normalization #161

CER Normalization #161

Coliverfelt commented Jun 24, 2022

lvwerra commented Jun 24, 2022

Coliverfelt commented Jun 27, 2022

lvwerra commented Jul 7, 2022

Coliverfelt commented Jul 26, 2022

lvwerra commented Aug 3, 2022

CER Normalization #161

CER Normalization #161

Comments

Coliverfelt commented Jun 24, 2022

lvwerra commented Jun 24, 2022

Coliverfelt commented Jun 27, 2022

lvwerra commented Jul 7, 2022

Coliverfelt commented Jul 26, 2022

lvwerra commented Aug 3, 2022