python打印正则匹配时生成空列表

1 投票
4 回答
2466 浏览
提问于 2025-04-18 13:58

我正在用正则表达式来匹配以“Dr. ”开头的名字。不过,当我打印出匹配的结果时,它们显示成了列表,有些还为空。我想要的只是名字。

代码:

import re

f = open('qwert.txt', 'r')

lines = f.readlines()
for x in lines:
       p=re.findall(r'(?:Dr[.](\w+))',x)
       q=re.findall(r'(?:As (\w+))',x)
       print p
       print q

qwert.txt:

Dr.John and Dr.Keel
Dr.Tensa
Dr.Jees
As John winning Nobel prize
As Mary wins all prize
car
 tick me 3
 python.hi=is good
 dynamic 
 and precise

tickme 2 and its in it
 its rapid  
 its best
 well and easy

想要的输出:

John
Keel
Tensa
Jees
John
Mary

实际输出:

['John', 'Keel']
[]
['Tensa']
[]
['Jees']
[]
[]
['John']
[]
['Mary']
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]

4 个回答

1

你需要遍历你的结果。

可以考虑一次性使用 findall(),这样就不需要在每次循环时都重复使用它。

>>> import re
>>> f = open('qwert.txt', 'r')
>>> for line in f:
...     matches = re.findall(r'(?:Dr\.|As )(\w+)', line)
...     for x in matches:
...         print x

John
Keel
Tensa
Jees
John
Mary
2

你看到的这个[]是因为findAll这个函数返回的是一个字符串的列表。如果你想要里面的字符串,可以对findAll的结果进行循环处理。

p=re.findall(r'(?:Dr[.](\w+))',x)
q=re.findall(r'(?:As (\w+))',x)
for str in p+q:
  print str
2

re.findall() 这个函数总是会返回一个匹配结果的列表,而这个列表可能是空的。如果你想查看结果,可以遍历这个列表,把每个元素单独打印出来:

p = re.findall(r'(?:Dr[.](\w+))', x)
for match in p:
    print match
q = re.findall(r'(?:As (\w+))', x)
for match in q:
    print q

如果列表是空的,那就什么都不会被打印出来。

你甚至可以这样做:

for match in re.findall(r'(?:Dr[.](\w+))', x):
    print match
for match in re.findall(r'(?:As (\w+))', x):
    print q

这样就可以不使用 pq 这两个变量了。

最后,你还可以把多个正则表达式合并成一个:

for match in re.findall(r'(?:Dr\.|As )(\w+)', x):
    print match

示例:

>>> import re
>>> lines = '''\
... Dr.John and Dr.Keel
... Dr.Tensa
... Dr.Jees
... As John winning Nobel prize
... As Mary wins all prize
... car
...  tick me 3
...  python.hi=is good
...  dynamic 
...  and precise
... 
... tickme 2 and its in it
...  its rapid  
...  its best
...  well and easy
... '''.splitlines(True)
>>> for x in lines:
...     for match in re.findall(r'(?:Dr\.|As )(\w+)', x):
...         print match
... 
John
Keel
Tensa
Jees
John
Mary
2

在打印之前,简单地测试一下findall的结果:

import re

with open('qwert.txt', 'r') as fh:
    for line in fh:
        res = re.findall(r'(?:Dr[.](\w+))', line)
        if res: 
            print '\n'.join(res)
        res = re.findall(r'(?:As (\w+))', line)
        if res:
            print '\n'.join(res)

如果正则表达式的数量超过几个,这种方法就不太好用了。可能更有用的方法是:

import re 
from functools import partial


def parseNames(regexs, line):
    """
    Returns a newline seperated string of matches given a 
    list or regular expressions and a string to search
    """
    res = ""
    for regex in regexs:
        res += '\n'.join(re.findall(regex, line))
    return res


regexs = [r'(?:Dr[.](\w+))', r'(?:As (\w+))'] 
match = partial(parseNames, regexs)

with open('qwert.txt', 'r') as fh:
    names = map(match, fh.readlines())
    print '\n'.join(filter(None, names))

输出:

John
Keel
Tensa
Jees
John
Mary

撰写回答