如何仅凭字符串部分删除列表中的元素？

2 投票

1 回答

1479 浏览

提问于 2025-04-18 07:37

我刚开始学习Python，想从一个列表中删除某些元素，但我并不知道整个字符串是什么。我现在用正则表达式从一个文本文件中提取顶级域名（TLD）。这个方法运行得很好，但它也把一些带有文件扩展名的字符串抓取进来了，比如“myfile.exe”，这是我不想要的。我的函数如下：

def find_domains(txt):

    # Regex out domains   
    lines = txt.split('\n')
    domains = []

    for line in lines:
        line  = line.rstrip()
        results = re.findall('([\w\-\.]+(?:\.|\[\.\])+[a-z]{2,6})', line)
        for item in results:
            if item not in domains:
                domains.append(item)

就像我说的，这个方法运行得很好，但我的列表最后变成了：

domains = ['thisisadomain.com', 'anotherdomain.net', 'a_file_I_dont_want.exe', 'another_file_I_dont_want.csv']

我试着用：

domains.remove(".exe")

但是如果我不知道整个字符串，这个方法似乎就不管用了。有没有办法使用通配符，或者遍历这个列表，只根据扩展名来删除那些不想要的元素呢？谢谢大家的帮助，如果需要更多信息，我会尽量提供。

通配符正则表达式列表操作字符串处理编程技巧数据过滤文件扩展名域名提取

1 个回答

我会使用内置的 str.endswith 函数来解决这个问题。这个函数会返回 True，如果字符串的结尾是你指定的后缀。

这个函数使用起来很简单，下面有个例子。从 Python 2.5 开始，你可以传入一个后缀的元组。

def find_domains(txt):

    # Regex out domains   
    lines = txt.split('\n')
    domains = []
    unwanted_extensions = ('.exe', '.net', '.csv') # tuple containing unwanted extensions, add more if you want.

    for line in lines:
        line  = line.rstrip()
        results = re.findall('([\w\-\.]+(?:\.|\[\.\])+[a-z]{2,6})', line)
        for item in results:
            # check if item is not in domains already and if item doesn't end with any of the unwanted extensions.
            if item not in domains and not item.endswith(unwanted_extensions):
                domains.append(item)

如你所见，所需的只是指定你不想要的扩展名（在 unwanted_extensions 这个元组中做了），然后在 if 语句中添加一个条件，确保 item 的结尾不包含这些扩展名。

回答于 2025-04-18 由 Python大师

分享举报

如何仅凭字符串部分删除列表中的元素？

1 个回答

撰写回答