NLTK混淆矩阵

>>> import nltk >>> from nltk.metrics import* >>> from nltk.corpus import brown >>> brown_a = nltk.corpus.brown.tagged_sents()[:300] >>> def tag_list(tagged_sents): return [tag for sent in tagged_sents for (word, tag) in sent] >>> tagger = nltk.UnigramTagger(brown_a) >>> gold = tag_list(brown_a) >>> def apply_tagger(tagger, corpus): return [tagger.tag(nltk.tag.untag(sent)) for sent in corpus] >>> test = tag_list(apply_tagger(tagger, brown_a) >>> cm = nltk.ConfusionMatrix(gold, test) >>> print cm.pretty_format(show_percents=False,values_in_chart=True,truncate=5,sort_by_count=True)

Traceback (most recent call last): File "<pyshell#12>", line 1, in <module> cm = nltk.ConfusionMatrix(gold, test) File "C:\Python27\lib\site-packages\nltk\metrics\confusionmatrix.py", line 46, in __init__ raise ValueError('Lists must have the same length.') ValueError: Lists must have the same length.

Traceback (most recent call last): File "<pyshell#23>", line 1, in <module> cm = nltk.ConfusionMatrix(gold, test) File "C:\Python27\lib\site-packages\nltk\metrics\confusionmatrix.py", line 46, in __init__ raise ValueError('Lists must have the same length.') ValueError: Lists must have the same length. >>>

2条回答

网友

1楼 · 编辑于 2024-04-26 11:47:41

对于这两个错误生成示例，错误状态为长度不匹配：

例1：len（test）=2459，len（gold）=6642
例2：len（test）=6261，len（gold）=6642

你也许可以通过以下方式修剪黄金：

gold_full = tag_list(brown_a)
gold = gold_full[:len(test)]

假设金本位也会比测试大；否则你可以加条件吗？在

网友

2楼 · 编辑于 2024-04-26 11:47:41

看看孔子矩阵的来源

def __init__(self, reference, test, sort_by_count=False):
    """
    Construct a new confusion matrix from a list of reference
    values and a corresponding list of test values.

    :type reference: list
    :param reference: An ordered list of reference values.
    :type test: list
    :param test: A list of values to compare against the
        corresponding reference values.
    :raise ValueError: If ``reference`` and ``length`` do not have
        the same length.
    """
    if len(reference) != len(test):
        raise ValueError('Lists must have the same length.')

http://www.nltk.org/_modules/nltk/metrics/confusionmatrix.html

我不打算浏览您的代码，因为我使用NLTK已经有一段时间了，但只要尝试打印您的黄金标准、预测数组并确保它们的长度相同

相关问题更多 >

编程相关推荐

热门问题

热门文章