我正在研究衡量文本摘要质量的指标。 为此,我找到了这样一个陈述:
Bleu measures precision: how much the words (and/or n-grams) in the machine generated summaries appeared in the human reference summaries.
Rouge measures recall: how much the words (and/or n-grams) in the human reference summaries appeared in the machine generated summaries.
尽管在SE的这个answer中,我发现:
ROUGE-n recall=40% means that 40% of the n-grams in the reference summary are also present in the generated summary.
ROUGE-n precision=40% means that 40% of the n-grams in the generated summary are also present in the reference summary.
ROUGE-n F1-score=40% is more difficult to interpret, like any F1-score.
这是矛盾的。它听起来像是胭脂精度等于BLEU和胭脂召回等于中的陈述,所以答案。胭脂精度是否与BLEU相同
在paper中也有说明:
It is clear that ROUGE-N is a recall-related measure because the denominator of the equation is the total sum of the number of n-grams occurring at the reference summary side. A closely related measure, BLEU, used in automatic evaluation of machine translation, is a precision-based measure.
我不明白这一点,因为(至少)rouge会返回一个精度值和一个回忆值。有人能把这件事说清楚吗? 谢谢大家!
目前没有回答
相关问题 更多 >
编程相关推荐