理解胭脂vs BLEU

2024-06-01 02:42:23 发布

男 | 程序猿一只，喜欢编程写python代码。

我正在研究衡量文本摘要质量的指标。为此，我找到了这样一个陈述：

Bleu measures precision: how much the words (and/or n-grams) in the machine generated summaries appeared in the human reference summaries.
Rouge measures recall: how much the words (and/or n-grams) in the human reference summaries appeared in the machine generated summaries.

尽管在SE的这个answer中，我发现：

ROUGE-n recall=40% means that 40% of the n-grams in the reference summary are also present in the generated summary.
ROUGE-n precision=40% means that 40% of the n-grams in the generated summary are also present in the reference summary.
ROUGE-n F1-score=40% is more difficult to interpret, like any F1-score.

这是矛盾的。它听起来像是胭脂精度等于BLEU和胭脂召回等于中的陈述，所以答案。胭脂精度是否与BLEU相同

在paper中也有说明：

It is clear that ROUGE-N is a recall-related measure because the denominator of the equation is the total sum of the number of n-grams occurring at the reference summary side. A closely related measure, BLEU, used in automatic evaluation of machine translation, is a precision-based measure.

我不明白这一点，因为（至少）rouge会返回一个精度值和一个回忆值。有人能把这件事说清楚吗？谢谢大家!

Tags： of the in that is machine summary generated

0条回答

目前没有回答

理解胭脂vs BLEU

相关问题更多 >

编程相关推荐

热门问题

热门文章

理解胭脂vs BLEU

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >