排除tex的最后匹配部分

Homo sapiens (human) -> Homo sapiens mitochondrion Capra hircus (goat) -> mitochondrion Capra hircus Escherichia coli -> Escherichia coli Xenopus (Silurana) tropicalis (western tree frog) -> Xenopus (Silurana) tropicalis

3条回答

网友

1楼 · 编辑于 2024-04-25 01:07:26

非正则表达式解决方案非常简单：

start, _, end = text.rpartition('(')
result = start or end

rpartition将从字符串的末尾开始搜索，并在第一个(返回三元组(text-before, separator, text-after)，在本例中separator = '('。如果字符串中没有(...)，那么所有内容都在text-after内，text-before和separator都是空字符串。当有一个(...)时，您将拥有text-before中最后一个(之前的所有文本，分隔符是(，text-after将是...)。你知道吗

因此start or end总是包含您需要的值。如果start是非空的，您需要它，否则结果是end。你知道吗

或者：

result = next(filter(None, text.rpartition('(')))

运行示例：

In [1]: texts = [
   ...:     'Homo sapiens (human)',
   ...:     'mitochondrion Capra hircus (goat)',
   ...:     'Escherichia coli',
   ...:     'Xenopus (Silurana) tropicalis (western tree frog)',
   ...: ]

In [2]: for text in texts:
   ...:     start, _, end = text.rpartition('(')
   ...:     print('in {!r}\t->\t{!r}'.format(text, start or end))
   ...:     
in 'Homo sapiens (human)'       ->      'Homo sapiens '
in 'mitochondrion Capra hircus (goat)'  ->      'mitochondrion Capra hircus '
in 'Escherichia coli'   ->      'Escherichia coli'
in 'Xenopus (Silurana) tropicalis (western tree frog)'  ->      'Xenopus (Silurana) tropicalis '

In [3]: for text in texts:
   ...:     print('in {!r}\t->\t{!r}'.format(text, next(filter(None, text.rpartition('(')))))
in 'Homo sapiens (human)'       ->      'Homo sapiens '
in 'mitochondrion Capra hircus (goat)'  ->      'mitochondrion Capra hircus '
in 'Escherichia coli'   ->      'Escherichia coli'
in 'Xenopus (Silurana) tropicalis (western tree frog)'  ->      'Xenopus (Silurana) tropicalis '

时间安排：

In [13]: texts *= 1000

In [14]: %%timeit
    ...: results = []
    ...: for text in texts:
    ...:     start, _, end = text.rpartition('(')
    ...:     results.append(start or end)
    ...: 
1000 loops, best of 3: 1.04 ms per loop

比基于regex的解决方案快4倍以上：

In [15]: import re

In [16]: %%timeit regex = re.compile(r'^(?:(?!.*\(.*\)).*|.*(?= \(.*\)))')
    ...: results = []
    ...: for text in texts:
    ...:     match = regex.match(text)
    ...:     results.append(match.group(0))
    ...: 
100 loops, best of 3: 4.27 ms per loop

filter版本略慢于or解决方案：

In [19]: %%timeit
    ...: results = []
    ...: for text in texts:
    ...:     results.append(next(filter(None, text.rpartition('('))))
    ...: 
1000 loops, best of 3: 1.89 ms per loop

网友

2楼 · 编辑于 2024-04-25 01:07:26

^(?:(?!.*\(.*\)).*|.*(?= \(.*\)))

See it in action

这样做的目的是要匹配一整行（括号中没有内容）：

(?!.*\(.*\)).*

或者直到最后一个空格的所有内容，后跟括号中的内容：

.*(?= \(.*\)

网友

3楼 · 编辑于 2024-04-25 01:07:26

你可以试试这个

(.+)(?:\(.+\))$|(.+)

(.+)(?:$.+$)$:查找一行末尾带有单词的括号，并匹配它前面的内容。你知道吗

(.+):匹配除换行符以外的所有字符。你知道吗

然后捕获group 1和group 2

输出

Homo sapiens 
mitochondrion Capra hircus 
Escherichia coli
Xenopus (Silurana) tropicalis

见DEMO

相关问题更多 >

编程相关推荐

热门问题

热门文章