在nltk标记文档中使用evaluate特性

from nltk.tag import UnigramTagger from nltk.corpus import treebank from nltk.tokenize import word_tokenize train_sents = treebank.tagged_sents() tagger = UnigramTagger(train_sents) text1 = "This is the first sentence. Now this is another one! How many do you plan to write?" words = word_tokenize(text1) value = tagger.tag(words) accuracy = tagger.evaluate(words)

1条回答

网友

1楼 · 发布于 2024-05-16 21:26:42

要使用NLTK训练和使用UnigramTagger，请执行以下操作：

>>> from nltk.tag import UnigramTagger
>>> from nltk.corpus import treebank
>>> from nltk import word_tokenize
>>> sent1 = "This is the first sentence."
>>> train_sents = treebank.tagged_sents()
>>> tagger = UnigramTagger(train_sents)
>>> tagger.tag(word_tokenize(sent1))
[('This', u'DT'), ('is', u'VBZ'), ('the', u'DT'), ('first', u'JJ'), ('sentence', u'NN'), ('.', u'.')]

tagger需要评估标签上的句子。UnigramTagger.evaluate()的输入是一个元组列表，其中元组中的第一项是单词，元组中的第二项是POS（即与UnigramTagger.train()函数相同的输入类型）。 . 在

见https://github.com/nltk/nltk/blob/develop/nltk/tag/api.py#L53 我们先把树库句子分成两部分，90%和10%：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章