从Python列表的元素中删除字符串

2024-05-18 23:41:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个列表c,它有353000个元素。每个元素都是一个解析字符串。此列表的示例如下:

print c[25:50]
['aluminum co of america', 'aluminum co of america', 'aluminum co of america', 'aluminum company of america', 'aluminum company of america', 'aluminum co of america', 'aluminum company of america', 'aluminum company of america', 'asset acceptance capital corp.', 'asset acceptance capital corp.', 'asset acceptance capital corp.', 'asset acceptance capital corp.', 'asset acceptance capital corp.', 'asset acceptance capital corp.', 'asset acceptance capital corp.', 'asset acceptance capital corp.', 'ace cash express, inc.', 'ace cash express, inc.', 'airtran holdings, inc.', 'airtran holdings, inc.', 'airtran holdings, inc.', 'airtran holdings, inc.', 'airtran holdings, inc.', 'airtran holdings, inc.', 'airtran holdings, inc.']

我数了一下单子上单词的频率:

from collections import Counter
r=[]
for e in c:
    r.extend(e.split())

count=Counter(r)

因此,列表中最常见的六个词是:

{'inc.': 18670, 'corporation': 9255, 'company': 2632, 'group,': 1190, '&': 1158, 'financial': 1025}

我想删除列表中的这些元素。例如,如果我有"aluminum corporation of america",那么输出应该是"aluminum of america"。有什么帮助吗?你知道吗


Tags: of元素列表assetcompanyincaceco
2条回答
# Using Generator Expression with `Counter` to speed it up a little bit
from collections import Counter
count = Counter(item for e in c for item in e.split())

# Get most frequently used words
words = {item for item, cnt in count.most_common(6)}

# filter the `words` in `c` and reconstruct the sentences in `c`
[" ".join([item for item in e.split() if item not in words]) for e in c]

可以使用正则表达式将要删除的单词替换为空字符串:

import re
p = re.compile(' |'.join(word for word in count))
cleaned = [p.sub('', item) for item in c]

edit:虽然,您必须转义regex中的.&,因此它将变得比上面更复杂一些。。。你知道吗

相关问题 更多 >

    热门问题