理解胭脂vs BLEU

2024-06-01 02:42:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在研究衡量文本摘要质量的指标。 为此,我找到了这样一个陈述:

Bleu measures precision: how much the words (and/or n-grams) in the machine generated summaries appeared in the human reference summaries.

Rouge measures recall: how much the words (and/or n-grams) in the human reference summaries appeared in the machine generated summaries.

尽管在SE的这个answer中,我发现:

ROUGE-n recall=40% means that 40% of the n-grams in the reference summary are also present in the generated summary.

ROUGE-n precision=40% means that 40% of the n-grams in the generated summary are also present in the reference summary.

ROUGE-n F1-score=40% is more difficult to interpret, like any F1-score.

这是矛盾的。它听起来像是胭脂精度等于BLEU胭脂召回等于中的陈述,所以答案。胭脂精度是否与BLEU相同

paper中也有说明:

It is clear that ROUGE-N is a recall-related measure because the denominator of the equation is the total sum of the number of n-grams occurring at the reference summary side. A closely related measure, BLEU, used in automatic evaluation of machine translation, is a precision-based measure.

我不明白这一点,因为(至少)rouge会返回一个精度值和一个回忆值。有人能把这件事说清楚吗? 谢谢大家!


Tags: oftheinthatismachinesummarygenerated