从列表中删除特定单词

2024-06-16 13:22:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试从列表中删除特定的单词,以及在文本文件中找到的<title><\title>。你知道吗

我还需要删除列表中包含的单词words=[a,is,and,there,here]

我的列表lines由以下文本组成:

lines=[<title>The query complexity of estimating weighted averages.</title>', '<title>New bounds for the query complexity of an algorithm that learns DFAs with correction and equivalence queries.</title>', '<title>A general procedure to check conjunctive query containment.</title>]

请帮我把清单上的字去掉,还有


Tags: andofthe文本列表heretitleis
3条回答

首先,你应该总是张贴你已经尝试了。你知道吗

仅使用内置库:

for i in range(0, len(lines)-1):
    for it in range(0, len(words)-1):
        lines[i] = lines[i].replace(words[it], '')

代码解释行:

  1. 对于“行”列表中的每个项目,i=当前行的项目编号
  2. 对于“words”列表中的每个项目,它=“words”中当前单词的项目号;将“list”中当前项目中的所有word项目替换为“”
  3. 列表“行”中的当前项更改为自身,而“字”中没有当前项

无需使用正则表达式,您可以更高效地执行此操作:

lines = ['<title>The query complexity of estimating weighted averages.</title>',
         '<title>New bounds for the query complexity of an algorithm that learns DFAs with correction and equivalence queries.</title>',
         '<title>A general procedure to check conjunctive query containment.</title>']
words = {"a", "is", "and", "there", "here"}

print([" ".join([w for line in lines
             for w in line[7:-8:].split(" ")
             if w.lower() not in words])])


['The query complexity of estimating weighted averages.
 New bounds for the query complexity of an algorithm that learns 
 DFAs with correction equivalence queries.
 general procedure to check conjunctive query containment.']

如果是case matter,则删除w.lower()打电话。还有如果您是通过解析网页来提取行,我建议您在写入文件之前从标记中提取文本。你知道吗

通过re.sub函数。你知道吗

>>> lines= ['<title>The query complexity of estimating weighted averages.</title>', '<title>New bounds for the query complexity of an algorithm that learns DFAs with correction and equivalence queries.</title>', '<title>A general procedure to check conjunctive query containment.</title>']
>>> words=['a','is','and','there','here']
>>> [re.sub(r'</?title>|\b(?:'+'|'.join(words)+r')\b', r'', line) for line in lines]
['The query complexity of estimating weighted averages.', 'New bounds for the query complexity of an algorithm that learns DFAs with correction  equivalence queries.', 'A general procedure to check conjunctive query containment.']

单词前后的\b有助于精确匹配单词。\b称为单词边界,匹配单词字符和非单词字符。你知道吗

相关问题 更多 >