NLTK混淆矩阵

2024-04-26 11:47:41 发布

您现在位置:Python中文网/ 问答频道 /正文

我试着用NLTK写混淆矩阵。 我试过下面的例子,运行良好。在

>>> import nltk
>>> from nltk.metrics import*
>>> from nltk.corpus import brown
>>> brown_a = nltk.corpus.brown.tagged_sents()[:300]
>>> def tag_list(tagged_sents):
    return [tag for sent in tagged_sents for (word, tag) in sent]

>>> tagger = nltk.UnigramTagger(brown_a)
>>> gold = tag_list(brown_a)
>>> def apply_tagger(tagger, corpus):
    return [tagger.tag(nltk.tag.untag(sent)) for sent in corpus]
>>> test = tag_list(apply_tagger(tagger, brown_a)
>>> cm = nltk.ConfusionMatrix(gold, test)
>>> print cm.pretty_format(show_percents=False,values_in_chart=True,truncate=5,sort_by_count=True)

但如果我给泰瑟的话

^{pr2}$

正在生成错误

Traceback (most recent call last):
  File "<pyshell#12>", line 1, in <module>
    cm = nltk.ConfusionMatrix(gold, test)
  File "C:\Python27\lib\site-packages\nltk\metrics\confusionmatrix.py", line 46, in __init__
    raise ValueError('Lists must have the same length.')
ValueError: Lists must have the same length.

即使我试着给出与

>>> test1=nltk.corpus.brown.tagged_sents()[700:1000]
>>> test = tag_list(apply_tagger(tagger, test1))
>>> cm = nltk.ConfusionMatrix(gold, test)

它给了我同样的错误。在

Traceback (most recent call last):
  File "<pyshell#23>", line 1, in <module>
    cm = nltk.ConfusionMatrix(gold, test)
  File "C:\Python27\lib\site-packages\nltk\metrics\confusionmatrix.py", line 46, in __init__
    raise ValueError('Lists must have the same length.')
ValueError: Lists must have the same length.
>>>

如果有人愿意帮忙,我该怎么解释呢?在


Tags: intesttagcmcorpustaggerlistsent
2条回答

对于这两个错误生成示例,错误状态为长度不匹配:

  • 例1:len(test)=2459,len(gold)=6642
  • 例2:len(test)=6261,len(gold)=6642

你也许可以通过以下方式修剪黄金:

gold_full = tag_list(brown_a)
gold = gold_full[:len(test)]

假设金本位也会比测试大;否则你可以加条件吗?在

看看孔子矩阵的来源

def __init__(self, reference, test, sort_by_count=False):
    """
    Construct a new confusion matrix from a list of reference
    values and a corresponding list of test values.

    :type reference: list
    :param reference: An ordered list of reference values.
    :type test: list
    :param test: A list of values to compare against the
        corresponding reference values.
    :raise ValueError: If ``reference`` and ``length`` do not have
        the same length.
    """
    if len(reference) != len(test):
        raise ValueError('Lists must have the same length.')

http://www.nltk.org/_modules/nltk/metrics/confusionmatrix.html

我不打算浏览您的代码,因为我使用NLTK已经有一段时间了,但只要尝试打印您的黄金标准、预测数组并确保它们的长度相同

相关问题 更多 >