用Python解析字符串
我找不到关于字符串的parse()方法的任何文档。有没有好的参考资料?我想把下面的内容:
frame 0 rows {3 2 3 3 3 3 2 3 2 3 3 3 2 3 2 3 4 3 3 4 3 2 2 3 3 3 2 2 2 2 3 3 3 2 3 2 3 3 3 3 4 3 4 3 3 3 3 4 3 2 3 3 3 3 2 2 2 4 4 3 3 3 3 3 4 4 4 3 2 4 3 4 3 3 3 4 3 3 4 3 3 4 4 3 3 3 4 4 3 4 3 3 3 3 3 4} columns {2 3 2 3 3 3 4 3 3 2 3 2 2 2 3 2 3 3 2 2 2 3 3 3 3 2 3 3 3 2 3 3 2 2 2 3 3 4 3 3 3 3 3 3 3 3 2 3 3 3 3 4 3 2 3 2 3 3 3 3 3 2 2 3 3 3 3 2 3 3 3 3 3 3 3 3 3 4 3 3 3 3 3 4 3 3 4 3 4 4 4 3 4 4 4 4 4 4 3 3 4 4 3 4 4 4 4 3 3 3 4 4 3 4 4 3 3 4 3 5 5 5 5 4 5 4 4 4}
解析成两个整数列表。
5 个回答
1
对于这种结构清晰的数据,pyparsing 可能用得有点多,但它是一个很好的学习示例:
from pyparsing import *
s = "frame 0 rows {3 2 3 3 3 3 2 3 2 3 3 3 2 3 2 3 4 3 3 4 3 2 2 3 3 3 2 2 2 2 3 3 3 2 3 2 3 3 3 3 4 3 4 3 3 3 3 4 3 2 3 3 3 3 2 2 2 4 4 3 3 3 3 3 4 4 4 3 2 4 3 4 3 3 3 4 3 3 4 3 3 4 4 3 3 3 4 4 3 4 3 3 3 3 3 4} columns {2 3 2 3 3 3 4 3 3 2 3 2 2 2 3 2 3 3 2 2 2 3 3 3 3 2 3 3 3 2 3 3 2 2 2 3 3 4 3 3 3 3 3 3 3 3 2 3 3 3 3 4 3 2 3 2 3 3 3 3 3 2 2 3 3 3 3 2 3 3 3 3 3 3 3 3 3 4 3 3 3 3 3 4 3 3 4 3 4 4 4 3 4 4 4 4 4 4 3 3 4 4 3 4 4 4 4 3 3 3 4 4 3 4 4 3 3 4 3 5 5 5 5 4 5 4 4 4}"
LBRACE,RBRACE = map(Suppress,"{}")
integer = Word(nums).setParseAction(lambda t:int(t[0]))
line = ("frame" + integer("frame") +
"rows" + LBRACE + ZeroOrMore(integer)("rows") + RBRACE +
"columns" + LBRACE + ZeroOrMore(integer)("columns") + RBRACE )
data = line.parseString(s)
print data.frame
print data.rows[:10]
print data.columns[:10]
输出结果是:
0
[3, 2, 3, 3, 3, 3, 2, 3, 2, 3]
[2, 3, 2, 3, 3, 3, 4, 3, 3, 2]
1
>>> a="frame 0 rows {3 2 3 3 3 3 2 3 2 3 3 3 2 3 2 3 4 3 3 4 3 2 2 3 3 3 2 2 2 2 3 3 3 2 3 2 3 3 3 3 4 3 4 3 3 3 3 4 3 2 3 3 3 3 2 2 2 4 4 3 3 3 3 3 4 4 4 3 2 4 3 4 3 3 3 4 3 3 4 3 3 4 4 3 3 3 4 4 3 4 3 3 3 3 3 4} columns {2 3 2 3 3 3 4 3 3 2 3 2 2 2 3 2 3 3 2 2 2 3 3 3 3 2 3 3 3 2 3 3 2 2 2 3 3 4 3 3 3 3 3 3 3 3 2 3 3 3 3 4 3 2 3 2 3 3 3 3 3 2 2 3 3 3 3 2 3 3 3 3 3 3 3 3 3 4 3 3 3 3 3 4 3 3 4 3 4 4 4 3 4 4 4 4 4 4 3 3 4 4 3 4 4 4 4 3 3 3 4 4 3 4 4 3 3 4 3 5 5 5 5 4 5 4 4 4}"
>>> import ast
>>> import re
>>> for match in re.finditer("\{([\d ]+)\}",a):
integers=match.groups()[0]
l=ast.literal_eval(integers.replace(" ",","))
print l
(3, 2, 3, 3, 3, 3, 2, 3, 2, 3, 3, 3, 2, 3, 2, 3, 4, 3, 3, 4, 3, 2, 2, 3, 3, 3, 2, 2, 2, 2, 3, 3, 3, 2, 3, 2, 3, 3, 3, 3, 4, 3, 4, 3, 3, 3, 3, 4, 3, 2, 3, 3, 3, 3, 2, 2, 2, 4, 4, 3, 3, 3, 3, 3, 4, 4, 4, 3, 2, 4, 3, 4, 3, 3, 3, 4, 3, 3, 4, 3, 3, 4, 4, 3, 3, 3, 4, 4, 3, 4, 3, 3, 3, 3, 3, 4)
(2, 3, 2, 3, 3, 3, 4, 3, 3, 2, 3, 2, 2, 2, 3, 2, 3, 3, 2, 2, 2, 3, 3, 3, 3, 2, 3, 3, 3, 2, 3, 3, 2, 2, 2, 3, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 2, 3, 3, 3, 3, 4, 3, 2, 3, 2, 3, 3, 3, 3, 3, 2, 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 4, 3, 3, 4, 3, 4, 4, 4, 3, 4, 4, 4, 4, 4, 4, 3, 3, 4, 4, 3, 4, 4, 4, 4, 3, 3, 3, 4, 4, 3, 4, 4, 3, 3, 4, 3, 5, 5, 5, 5, 4, 5, 4, 4, 4)
我从来没听说过有一种解析方法可以像你问的那样解析字符串。不过,解析这个字符串其实并不难。下面是怎么做的。
5
Python中的字符串解析函数parse()在这里没什么用(它的用法非常复杂)。在这种情况下,我会选择最简单的方法:使用正则表达式!如果's'是你上面提到的字符串,
import re
lists = [
[int(i) for i in match.split()]
for match in re.findall(r'{(.*?)}', s)
]
print lists