从lis中删除带有自定义停止词的短语

2024-05-15 21:45:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两张单子

listA = ['New Delhi', 'Moscow', 'Berlin', 'France', 'To Washington']
stopwordlist = ['new', 'To']

我想得到这样的东西

finalList = ['Moscow', 'Berlin', 'France']

如果我在寻找完整的词语,我迄今为止所尝试的方法是有效的:

listB = []
for item in listA:
    if item not in stopwordlist:
        listB.append(item)
    else:
        continue
....            
....
    return listB

我们可以拆分item,然后检查stopwordlist中的那些。但这似乎是许多解决办法。或者我可以使用regex re.match。你知道吗


Tags: toinnewitem单子berlinfrancelista
3条回答
sl = tuple(i.lower() for i in stopwordlist)
[i for i in listA if not i.lower().startswith(sl)]

输出

['Moscow', 'Berlin', 'France']
listA =['New Delhi','Moscow', 'Berlin','France', 'To Washington']
stopwordlist = ['new','To']
listA = [i.lower() for i in listA]
stopwordlist = [i.lower() for i in stopwordlist]

listB =[]

for item in listA:
    flag = True
    for i in item.split(' '):
        if i in stopwordlist:
            flag =False
    if flag:
        listB.append(item)
print(listB)

有一种方法

>>> listA = ['New Delhi', 'Moscow', 'Berlin', 'France', 'To Washington']
>>> stopwordlist = ['new', 'To']
>>> finalList = [i for i in listA if not any(j.lower() in i.lower() for j in stopwordlist)]
>>> finalList
['Moscow', 'Berlin', 'France']

或者可以使用内置的filter函数。你知道吗

>>> listA = ['New Delhi', 'Moscow', 'Berlin', 'France', 'To Washington']
>>> stopwordlist = ['new', 'To']
>>> list(filter(lambda x: not any(j.lower() in x.lower() for j in stopwordlist), listA))
['Moscow', 'Berlin', 'France']

相关问题 更多 >