python打印正则匹配时生成空列表
我正在用正则表达式来匹配以“Dr. ”开头的名字。不过,当我打印出匹配的结果时,它们显示成了列表,有些还为空。我想要的只是名字。
代码:
import re
f = open('qwert.txt', 'r')
lines = f.readlines()
for x in lines:
p=re.findall(r'(?:Dr[.](\w+))',x)
q=re.findall(r'(?:As (\w+))',x)
print p
print q
qwert.txt:
Dr.John and Dr.Keel
Dr.Tensa
Dr.Jees
As John winning Nobel prize
As Mary wins all prize
car
tick me 3
python.hi=is good
dynamic
and precise
tickme 2 and its in it
its rapid
its best
well and easy
想要的输出:
John
Keel
Tensa
Jees
John
Mary
实际输出:
['John', 'Keel']
[]
['Tensa']
[]
['Jees']
[]
[]
['John']
[]
['Mary']
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
4 个回答
1
你需要遍历你的结果。
可以考虑一次性使用 findall()
,这样就不需要在每次循环时都重复使用它。
>>> import re
>>> f = open('qwert.txt', 'r')
>>> for line in f:
... matches = re.findall(r'(?:Dr\.|As )(\w+)', line)
... for x in matches:
... print x
John
Keel
Tensa
Jees
John
Mary
2
你看到的这个[]
是因为findAll
这个函数返回的是一个字符串的列表。如果你想要里面的字符串,可以对findAll
的结果进行循环处理。
p=re.findall(r'(?:Dr[.](\w+))',x)
q=re.findall(r'(?:As (\w+))',x)
for str in p+q:
print str
2
re.findall()
这个函数总是会返回一个匹配结果的列表,而这个列表可能是空的。如果你想查看结果,可以遍历这个列表,把每个元素单独打印出来:
p = re.findall(r'(?:Dr[.](\w+))', x)
for match in p:
print match
q = re.findall(r'(?:As (\w+))', x)
for match in q:
print q
如果列表是空的,那就什么都不会被打印出来。
你甚至可以这样做:
for match in re.findall(r'(?:Dr[.](\w+))', x):
print match
for match in re.findall(r'(?:As (\w+))', x):
print q
这样就可以不使用 p
和 q
这两个变量了。
最后,你还可以把多个正则表达式合并成一个:
for match in re.findall(r'(?:Dr\.|As )(\w+)', x):
print match
示例:
>>> import re
>>> lines = '''\
... Dr.John and Dr.Keel
... Dr.Tensa
... Dr.Jees
... As John winning Nobel prize
... As Mary wins all prize
... car
... tick me 3
... python.hi=is good
... dynamic
... and precise
...
... tickme 2 and its in it
... its rapid
... its best
... well and easy
... '''.splitlines(True)
>>> for x in lines:
... for match in re.findall(r'(?:Dr\.|As )(\w+)', x):
... print match
...
John
Keel
Tensa
Jees
John
Mary
2
在打印之前,简单地测试一下findall
的结果:
import re
with open('qwert.txt', 'r') as fh:
for line in fh:
res = re.findall(r'(?:Dr[.](\w+))', line)
if res:
print '\n'.join(res)
res = re.findall(r'(?:As (\w+))', line)
if res:
print '\n'.join(res)
如果正则表达式的数量超过几个,这种方法就不太好用了。可能更有用的方法是:
import re
from functools import partial
def parseNames(regexs, line):
"""
Returns a newline seperated string of matches given a
list or regular expressions and a string to search
"""
res = ""
for regex in regexs:
res += '\n'.join(re.findall(regex, line))
return res
regexs = [r'(?:Dr[.](\w+))', r'(?:As (\w+))']
match = partial(parseNames, regexs)
with open('qwert.txt', 'r') as fh:
names = map(match, fh.readlines())
print '\n'.join(filter(None, names))
输出:
John
Keel
Tensa
Jees
John
Mary