Python加速从超大字符串中检索数据

tempString = ','.join(str(n) for n in coords) tempString = re.sub(',{2,6}', '_', tempString) tempString = re.sub("[^0-9\-\.\_]", ",", tempString) tempString = re.sub(',+', ',', tempString) clean1 = re.findall(('[-+]?[0-9]*\.?[0-9]+,[-+]?[0-9]*\.?[0-9]+,' '[-+]?[0-9]*\.?[0-9]+'), tempString) tempString = '_'.join(str(n) for n in clean1) tempString = re.sub(',', ' ', tempString)

2条回答

网友

1楼 · 编辑于 2024-04-26 08:04:44

根据您的样本数据：

>>> s = "-5.65500020981,6.88999986649,-0.454999923706,1,,,-5.65500020981,6.95499992371,-0.454999923706,1,,,"
>>> def getValues(s):
...     output = []
...     while s:
...         # get the three values you want, discard the 3 commas, and the 
...         # remainder of the string
...         v1, v2, v3, _, _, _, s = s.split(',', 6)
...         output.append("%s %s %s" % (v1, v2, v3))
...         
...     return output
>>> getValues(s)
['-5.65500020981 6.88999986649 -0.454999923706', '-5.65500020981 6.95499992371 -0.454999923706']

…一旦将这些解析后的值作为字符串放入列表中，您就可以执行任何其他需要执行的操作。在

或者，如果愿意，可以使用生成器，这样就不需要一次生成整个返回字符串：

^{pr2}$

您可能还想尝试一种方法，在,,,一组逗号上预拆分长列表，而不是不断地构建和处理一组较短的字符串，例如：

>>> def getValues(s):
...     # split your long string into a list of chunked strings
...     strList = s.split(",,,")
...     for chunk in strList:
...         if chunk:
...         # ...then just parse apart each individual set of data values
...             vals = chunk.split(',')
...             yield "%s %s %s" % (vals[0], vals[1], vals[2])
>>> for v in getValues(s10):
...     print v
-5.1  6.8  -0.454
-5.1  6.8  -0.454
-5.1  6.8  -0.454
-5.1  6.8  -0.454
-5.1  6.8  -0.454
-5.1  6.8  -0.454
-5.1  6.8  -0.454
-5.1  6.8  -0.454
-5.1  6.8  -0.454
-5.1  6.8  -0.454

在某种程度上，当你处理像这样的大数据集并且有速度问题时，把事情向下推到在C中做艰苦工作的模块，比如NumPy，是有意义的。在

网友

2楼 · 编辑于 2024-04-26 08:04:44

减少内存消耗而不必更改regex中的任何内容的一种方法是使用re.finditer()方法而不是re.findall()。这将逐个迭代这些值，而不是将整个字符串读入单个列表对象。http://docs.python.org/library/re.html#re.finditer

相关问题更多 >

编程相关推荐

热门问题

热门文章