将re.MULTILINE和re.DOTALL一起使用python

2条回答

网友

1楼 · 编辑于 2024-05-15 23:46:27

你的问题可能是你使用了\r\n。相反，请尝试仅使用\n：

>>> x = """
... >U51677 Human non-histone chromatin protein HMG1 (HMG1) gene, complete
... 
...        cds. #some records don't have this line (see below)
... 
...        Length = 2575
... (some text)
... 
... >U51677 Human non-histone chromatin protein HMG1 (HMG1) gene, complete
... 
...        Length = 2575
... (some text)
... 
... (etc...)
... """
>>> re.search("^(>.*)\n.*(?:\n*.?)Length\s=\s(\d+)", x, re.MULTILINE|re.DOTALL)
<_sre.SRE_Match object at 0x10c937e00>
>>> _.group(2)
'2575'

另外，你的第一个.*太贪婪了。相反，请尝试使用：^(>.*?)$.*?Length\s=\s(\d+)：

>>> re.findall("^(>.*?)$.*?Length\s=\s(\d+)", x, re.MULTILINE|re.DOTALL)
[('>U51677 Human non-histone chromatin protein HMG1 (HMG1) gene, complete', '2575'), ('>U51677 Human non-histone chromatin protein HMG1 (HMG1) gene, complete', '2575')]

网友

2楼 · 编辑于 2024-05-15 23:46:27

试试这个正则表达式：

"^(>[^\r\n]*).*?Length\s=\s(\d+)"

同时设置两个选项（使用管道符号）。

第一个捕获组将匹配到>之后的第一个换行符（与操作系统无关）。然后.*?将匹配任何字符，直到遇到第一个Length。剩下的和你第一次尝试的一样。

前一次尝试的问题似乎是，您使用的.*可以匹配任何东西，同时又贪婪（因此它将尽可能地消耗，包括下面的Length = 2575）。

相关问题更多 >

编程相关推荐

热门问题

热门文章

将re.MULTILINE和re.DOTALL一起使用python

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >