如何从特定的大数据中提取大数据行

# Comment # Comment # Comment # Comment # Comment Comment # Comment **#Raw SIFs at Crack Propagation Step: 0** # Vertex, X, Y, Z, K_I, K_II, 0 , 2.100000e+00 , 2.000000e+00 , -1.000000e-04 , 0.000000e+00 , 0.000000e+00 , 1 , 2.100000e+00 , 2.000000e+00 , 1.699733e-01 , 8.727065e+00 , -8.696262e-04 , 2 , 2.100000e+00 , 2.000000e+00 , 3.367067e-01 , 8.907810e+00 , -2.548819e-04 , **# MLS SIFs at Crack Propagation Step: 0** # MLS approximation: # Sample, t, NA, NA, K_I, K_II, # Crack front stretch: 0 0 , 0.000000e+00 , 0.000000e+00 , 0.000000e+00 , 8.446880e+00 , -1.360875e-03 , 1 , 5.670333e-02 , 0.000000e+00 , 0.000000e+00 , 8.554168e+00 , -1.156931e-03 , 2 , 1.134067e-01 , 0.000000e+00 , 0.000000e+00 , 8.648241e+00 , -9.755573e-04 , # more comments more comments # more comments **# Raw SIFs at Crack Propagation Step: 1** # Vertex, X, Y, Z, K_I, K_II, 0 , 2.186139e+00 , 2.000000e+00 , -1.688418e-03 , 0.000000e+00 , 0.000000e+00 , 1 , 2.192003e+00 , 2.000000e+00 , 1.646902e-01 , 9.571022e+00 , 4.770358e-03 , 2 , 2.196234e+00 , 2.000000e+00 , 3.319183e-01 , 9.693934e+00 , -9.634989e-03 , **# MLS SIFs at Crack Propagation Step: 1** # MLS approximation: # Sample, t, NA, NA, K_I, K_II, # Crack front stretch: 0 0 , 0.000000e+00 , 0.000000e+00 , 0.000000e+00 , 9.402031e+00 , 2.097959e-02 , 1 , 5.546786e-02 , 0.000000e+00 , 0.000000e+00 , 9.467541e+00 , 1.443546e-02 , 2 , 1.109357e-01 , 0.000000e+00 , 0.000000e+00 , 9.525021e+00 , 8.554051e-03 ,

2条回答

网友

1楼 · 编辑于 2024-05-13 20:16:38

尝试：

# helper function to parse a data block
def parse_SIF(lines):
    SIF = []
    while lines:
        line = lines.pop(0).lstrip()
        if line == '' or line.startswith('#'):
            continue
        if line.startswith('**#'):
            lines.insert(0, line)
            break
        data = line.split(',')
        # pick only columns 0, 4, 5 and
        # convert to appropiate numeric format
        # and append to list for current SIF and step
        SIF.append([int(data[0]), float(data[4]), float(data[5])])
    return SIF

# your global data structure - nested lists
raw = []
mls = []

# read whole file into one list - ok if your data is not large
with open('data') as fptr:
    lines = fptr.readlines()

# global parse routine - call helper function to parse data blocks
while lines:
    line = lines.pop(0)
    if line.startswith('**#'):
        if line.find('Raw SIFs at Crack Propagation Step:') > -1:
            raw.append(parse_SIF(lines))
        if line.find('MLS SIFs at Crack Propagation Step:') > -1:
            mls.append(parse_SIF(lines))

# show results for your example data
from pprint import pprint
for raw_step, mls_step in zip(raw, mls):
    print 'raw:'
    pprint(raw_step)
    print 'mls:'
    pprint(mls_step)

产生：

^{pr2}$

网友

2楼 · 编辑于 2024-05-13 20:16:38

这是一个更一般的提示：您是否考虑过使用更合适的文件格式？在您的用例中，我建议使用hdf5文件格式。它有非常好的python绑定：http://code.google.com/p/h5py/

Hdf5支持分段，python绑定也支持切片和numpy。我想这会让你更容易些。在

相关问题更多 >

编程相关推荐

热门问题

热门文章