列表推导式移除仅包含数字(包含"_"或"-")的Python列表元素

2 投票
3 回答
91 浏览
提问于 2025-04-13 00:26

我有很多这样的列表:

synonyms = ["3,2'-DIHYDROXYCHALCONE", '36574-83-1', '36574831', "2',3-Dihydroxychalcone",  '(E)-1-(2-hydroxyphenyl)-3-(3-hydroxyphenyl)prop-2-en-1-one', MLS002693861]

我需要从中删除所有只包含数字的元素。我搞不懂怎么去掉元素 [1],因为它虽然是数字,但中间夹杂了随机的破折号。

当然,这样做是行不通的,因为破折号让这个元素不再是纯数字:

synonym_subset = [x for x in synonym_subset if not (x.isdigit())]

而且我不能简单地去掉破折号,因为我希望其他元素中的破折号保留:

synonym_subset = [x.replace('-','') for x in synonym_subset]

我可以运行上面的代码来找到要删除的元素的索引,然后再通过索引去删除它们,但我希望能有一个一行代码就能搞定的方法。

谢谢。

3 个回答

0

你可以使用 filter 这个功能 [编辑:不过在这种情况下,你可能不应该使用它,正如评论中提到的那样]:

import re

synonyms = [
    "3,2'-DIHYDROXYCHALCONE",
    "36574-83-1",
    "36574831",
    "2',3-Dihydroxychalcone",
    "(E)-1-(2-hydroxyphenyl)-3-(3-hydroxyphenyl)prop-2-en-1-one",
    "MLS002693861",
]

filtered_synonyms = list(
    filter(lambda x: not re.sub(r"[-_]", "", x).isdigit(), synonyms)
)

结果是:

["3,2'-DIHYDROXYCHALCONE", "2',3-Dihydroxychalcone", '(E)-1-(2-hydroxyphenyl)-3-(3-hydroxyphenyl)prop-2-en-1-one', 'MLS002693861']
2

作为对已经发布的回复的一个小补充,根据字符串的长度,使用set()可能会更合适,比如:

synonyms = [
    "3,2'-DIHYDROXYCHALCONE",
    "36574-83-1",
    "36574831",
    "2',3-Dihydroxychalcone",
    "(E)-1-(2-hydroxyphenyl)-3-(3-hydroxyphenyl)prop-2-en-1-one",
    "MLS002693861",
]

myset = set("0123456789-_")
[s for s in synonyms if not set(s).issubset(myset)]

编辑:正如@no comment提到的,这可以通过使用issuperset进一步改进,如下所示:

isdigits = set("0123456789-_").issuperset
[s for s in synonyms if not isdigits(s)]

每个都会返回:

["3,2'-DIHYDROXYCHALCONE", "2',3-Dihydroxychalcone", '(E)-1-(2-hydroxyphenyl)-3-(3-hydroxyphenyl)prop-2-en-1-one', 'MLS002693861']

附言:另一种方法是使用ord(),但这通常会更慢,而且可读性较差:

[s for s in synonyms if not all(ord(k) in (*range(48,58),45,95) for k in s)]
3

试试这个:

synonyms = [
    "3,2'-DIHYDROXYCHALCONE",
    "36574-83-1",
    "36574831",
    "2',3-Dihydroxychalcone",
    "(E)-1-(2-hydroxyphenyl)-3-(3-hydroxyphenyl)prop-2-en-1-one",
    "MLS002693861",
]

out = [s for s in synonyms if not all(ch in "0123456789-_" for ch in s)]
print(out)

输出结果是:

[
    "3,2'-DIHYDROXYCHALCONE",
    "2',3-Dihydroxychalcone",
    "(E)-1-(2-hydroxyphenyl)-3-(3-hydroxyphenyl)prop-2-en-1-one",
    "MLS002693861",
]

撰写回答