如何从一行中提取一对有标记的字符串（python） - 问答 - Python中文网

如何从一行中提取一对有标记的字符串（python）

2024-04-25 05:11:53 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我的朋友们

我花了不少时间在这件事上。。。但还不能找到更好的方法。顺便说一下，我是用python编写代码的。在

下面是我正在处理的文件中的一行文本，例如：

“>；ref | ZP_01631227.1 | 3-脱氢喹酸合酶[Nodularia spumigena CCY9414]…”

如何从行中提取两个字符串“ZP_01631227.1”和“Nodularia spumigena CCY9414”？在

成对的“| |”和方括号就像标记，所以我们知道我们要在这两个符号之间找到字符串。。。在

我想我可能可以循环所有的字符行和做它的艰难的方式。只是花了太多时间。。。想知道是否有一个python库或其他聪明的方法可以很好地完成它？在

感谢大家！在

Tags：文件方法字符串代码标记文本 gt ref

2条回答

网友

1楼 · 编辑于 2024-04-25 05:11:53

>>> for line in open("file"):
...     if "|" in line:
...         whatiwant_1=line.split("|")[1]
...         if "[" in line:
...             whatiwant_2=line.split("[")[1].split("]")[0]
...
>>> print whatiwant_1 , whatiwant_2
ZP_01631227.1 Nodularia spumigena CCY9414

网友

2楼 · 编辑于 2024-04-25 05:11:53

一种简洁的替代方法是正则表达式（由于某些原因，它们在Python社区中的代表性很差，但它们确实为简单的文本处理提供了简洁性和强大的能力）：

import re
s = ">ref|ZP_01631227.1| 3-dehydroquinate synthase [Nodularia spumigena CCY9414]..."
mo = re.search(r'\|(.*?)\|/*\[(.*?)\]', s)
if mo:
  thefirst, thesecond = mo.groups()

相关问题更多 >

编程相关推荐

热门问题

热门文章