我试着用NLTK写混淆矩阵。 我试过下面的例子,运行良好。在
>>> import nltk
>>> from nltk.metrics import*
>>> from nltk.corpus import brown
>>> brown_a = nltk.corpus.brown.tagged_sents()[:300]
>>> def tag_list(tagged_sents):
return [tag for sent in tagged_sents for (word, tag) in sent]
>>> tagger = nltk.UnigramTagger(brown_a)
>>> gold = tag_list(brown_a)
>>> def apply_tagger(tagger, corpus):
return [tagger.tag(nltk.tag.untag(sent)) for sent in corpus]
>>> test = tag_list(apply_tagger(tagger, brown_a)
>>> cm = nltk.ConfusionMatrix(gold, test)
>>> print cm.pretty_format(show_percents=False,values_in_chart=True,truncate=5,sort_by_count=True)
但如果我给泰瑟的话
^{pr2}$正在生成错误
Traceback (most recent call last):
File "<pyshell#12>", line 1, in <module>
cm = nltk.ConfusionMatrix(gold, test)
File "C:\Python27\lib\site-packages\nltk\metrics\confusionmatrix.py", line 46, in __init__
raise ValueError('Lists must have the same length.')
ValueError: Lists must have the same length.
即使我试着给出与
>>> test1=nltk.corpus.brown.tagged_sents()[700:1000]
>>> test = tag_list(apply_tagger(tagger, test1))
>>> cm = nltk.ConfusionMatrix(gold, test)
它给了我同样的错误。在
Traceback (most recent call last):
File "<pyshell#23>", line 1, in <module>
cm = nltk.ConfusionMatrix(gold, test)
File "C:\Python27\lib\site-packages\nltk\metrics\confusionmatrix.py", line 46, in __init__
raise ValueError('Lists must have the same length.')
ValueError: Lists must have the same length.
>>>
如果有人愿意帮忙,我该怎么解释呢?在
对于这两个错误生成示例,错误状态为长度不匹配:
你也许可以通过以下方式修剪黄金:
假设金本位也会比测试大;否则你可以加条件吗?在
看看孔子矩阵的来源
http://www.nltk.org/_modules/nltk/metrics/confusionmatrix.html
我不打算浏览您的代码,因为我使用NLTK已经有一段时间了,但只要尝试打印您的黄金标准、预测数组并确保它们的长度相同
相关问题 更多 >
编程相关推荐