仅从NTLK pos_标记中删除“NN”单词

网友

1楼 · 编辑于 2024-05-12 22:15:36

a = [('Hello', 'NNP'), ('my', 'PRP$'), ('name', 'NN'), ('is', 'VBZ'), ('Abhishek', 'NNP'), ('Mitra', 'NNP')]

c = [b  for b in a if b[-1] != 'NN']

网友

2楼 · 编辑于 2024-05-12 22:15:36

还有一种方法（利用元组的优势）：

from nltk.corpus import wordnet as wn
from nltk import pos_tag
import nltk

sentence = "Hello my name is Abhishek Mitra"
sentence = nltk.word_tokenize(sentence)
sent = pos_tag(sentence) 
sent_clean = [x for (x,y) in sent if y not in ('NN')]

print(sent_clean)

输出：

^{pr2}$

说明： 在准则中：

sent_clean = [x for (x,y) in sent if y not in ('NN')]

在对句子中的每个单词进行POS标记后，您将尝试提取由POS标记创建的元组的单词。指定要提取的条件是第二部分

同样，如果要消除多个POS：

sent_clean2 = [x for (x,y) in sent if y not in ('PRP$', 'VBZ', 'NN')]

print(sent_clean2)

输出：

['Hello', 'Abhishek', 'Mitra']

网友

3楼 · 编辑于 2024-05-12 22:15:36

您可以使用列表理解来删除“NN”元素：

from nltk.corpus import wordnet as wn
from nltk import pos_tag
import nltk

sentence = "Hello my name is Abhishek Mitra"
sentence = nltk.word_tokenize(sentence)
sent = pos_tag(sentence)
print [s for s in sent if s[1] != 'NN']

相关问题更多 >

编程相关推荐

热门问题

热门文章

仅从NTLK pos_标记中删除“NN”单词

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >