Python中特殊字符的研究问题

2024-04-26 11:23:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个文件(我只显示一部分),我想删除一个特殊字符。在

OTU1359 UniRef90_A0A095VQ09 UniRef90_A0A0C1UI80 UniRef90_A0A1M4ZSK2 UniRef90_A0A1W1CJV7 UniRef90_A0A1Z9J2X0 UniRef90_A0A1Z9THL2 UniRef90_A0A2E3B6A5 UniRef90_A0A2E5MT47 UniRef90_A0A2E5VCW9 UniRef90_A0A2E6CDK4 UniRef90_A0A2E6KTE6 UniRef90_A0A2E8AIM6 UniRef90_A0A2E8RIG1 UniRef90_A0A2E8YNS3 UniRef90_A0A2E9VEK0 UniRef90_W6RCT6

OTU0980 UniRef90_A0A084TMQ7 UniRef90_A0A090PK65 UniRef90_A0A0P1G8P0 UniRef90_A0A0P1IHL1 UniRef90_A0A286ILS7 UniRef90_A0A2A5E7H9 UniRef90_A0A2D9J217 UniRef90_H3NS47 UniRef90_H3NSN9 UniRef90_H3NSP0 UniRef90_H3NSP7 UniRef90_H3NUB2 UniRef90_H3NY28 UniRef90_H3NY47 UniRef90_UPI000C2CBC51

我想删除字符“otuxxx”(它总是以OTU开头,后面总是有4个数字)。它可以通过一行显示多个otuxxx

我试过了:

re.search("OTU[0-9]{4}", line)

它不起作用。。有什么帮助吗?在


Tags: 文件otu特殊字符uniref90a0a1m4zsk2a0a1z9thl2otuxxxa0a2e5mt47
2条回答

您可以使用re.sub,它实际执行replacement或用您提供的文本替换匹配的文本。您可以在这里找到文档:https://docs.python.org/3/library/re.html

这里有一个可能的实现方法:

from re import compile, sub, MULTILINE

text = '''
OTU1359 UniRef90_A0A095VQ09 UniRef90_A0A0C1UI80 UniRef90_A0A1M4ZSK2 UniRef90_A0A1W1CJV7 UniRef90_A0A1Z9J2X0 UniRef90_A0A1Z9THL2 UniRef90_A0A2E3B6A5 UniRef90_A0A2E5MT47 UniRef90_A0A2E5VCW9 UniRef90_A0A2E6CDK4 UniRef90_A0A2E6KTE6 UniRef90_A0A2E8AIM6 UniRef90_A0A2E8RIG1 UniRef90_A0A2E8YNS3 UniRef90_A0A2E9VEK0 UniRef90_W6RCT6

OTU0980 UniRef90_A0A084TMQ7 UniRef90_A0A090PK65 UniRef90_A0A0P1G8P0 UniRef90_A0A0P1IHL1 UniRef90_A0A286ILS7 UniRef90_A0A2A5E7H9 UniRef90_A0A2D9J217 UniRef90_H3NS47 UniRef90_H3NSN9 UniRef90_H3NSP0 UniRef90_H3NSP7 UniRef90_H3NUB2 UniRef90_H3NY28 UniRef90_H3NY47 UniRef90_UPI000C2CBC51
'''

replacemnt = ''
regex = compile(r'OTU\d{4}', flags=MULTILINE)
cleaned = sub(regex, replacemnt, text)

我建议使用re.sub并将模式匹配作为整个单词来查找,以避免在其他单词中出现部分匹配。在

s = re.sub(r"\s*\bOTU[0-9]{4}\b", "", line).strip()

参见regex demo。结尾的.strip()删除了字符串结尾/开头的匹配项后剩余的所有多余的前导/尾随空格。在

请参见regex graph

enter image description here

相关问题 更多 >