在NLTK中HMM标记是不准确的

2024-05-16 07:18:30 发布

男 | 程序猿一只，喜欢编程写python代码。

我一直在尝试使用HMM实现一个简单的POS标记器，并得出以下代码。在

 import nltk
 from nltk.corpus import treebank

train_data = treebank.tagged_sents()[:3000]

print train_data[0]
# [(u'Pierre', u'NNP'), (u'Vinken', u'NNP'), (u',', u','), (u'61', u'CD'), (u'years', u'NNS'), (u'old', u'JJ'), (u',', u','), ... ]

from nltk.tag import hmm

trainer = hmm.HiddenMarkovModelTrainer()
tagger = trainer.train_supervised(train_data)

print tagger

print tagger.tag("Alex was born in Connecticut .".split())
# [('Alex', u'NNP'), ('was', u'NNP'), ('born', u'NNP'), ('in', u'NNP'), ('Connecticut', u'NNP'), ('.', u'NNP')]

print tagger.tag("Joe met Joanne in Delhi .".split())
# [('Joe', u'NNP'), ('met', u'VBD'), ('Joanne', u'NNP'), ('in', u'IN'), ('Delhi', u'NNP'), ('.', u'NNP')]

print tagger.tag("Chicago is the birthplace of Ginny".split())
# [('Chicago', u'NNP'), ('is', u'VBZ'), ('the', u'DT'), ('birthplace', u'NNP'), ('of', u'NNP'), ('Ginny', u'NNP')]

正如你所看到的（许多）标签几乎是关闭的。为什么会这样？我觉得火车组够大的了：|。。。？在

另外，当我运行tagger.evaluate(treebank.tagged_sents()[3000:])时，只有一个0.3与黄金标准匹配

也发布了here：

Tags： in from import data tag train tagger split

0条回答

目前没有回答

在NLTK中HMM标记是不准确的

相关问题更多 >

编程相关推荐

热门问题

热门文章

在NLTK中HMM标记是不准确的

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >