我使用nltk模块来标记一个句子。但是,我需要帮助向令牌添加更多信息,即
下面是一个例子
sentences = "John wrote His name as Ishmael"
def findPOS(input):
tagged = nltk.sent_tokenize(input.strip())
tagged = [nltk.word_tokenize(sent) for sent in tagged]
tagged = [nltk.pos_tag(sent) for sent in tagged ]
print tagged
findPOS(sentences)
>> [[('John', 'NNP'), ('wrote', 'VBD'), ('His', 'NNP'), ('name', 'NN'), ('as', 'IN'), ('Ishmael', 'NNP')]]
#extra information added and printed:
(John CAPITALIZED noun)
(wrote non-noun)
(His CAPITALIZED noun)
(name LOWERCASE non-noun)
(as non-noun)
(Ishmael CAPITALIZED noun)
压实度(不推荐):
更具可读性的代码:
如果您坚持要删除
None
元素(如果该元素未大写):相关问题 更多 >
编程相关推荐