用Python分析文本文件

# function for parsing the data def data_parser(text, dic): for i, j in dic.iteritems(): text = text.replace(i,j) return text # open input/output files inputfile = open('test.dat') outputfile = open('test.csv', 'w') my_text = inputfile.readlines()[4:] #reads to whole text file, skipping first 4 lines # sample text string, just for demonstration to let you know how the data looks like # my_text = '"2012-06-23 03:09:13.23",4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,"NAN",-0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636' # dictionary definition 0-, 1- etc. are there to parse the date block delimited with dashes, and make sure the negative numbers are not effected reps = {'"NAN"':'NAN', '"':'', '0-':'0,','1-':'1,','2-':'2,','3-':'3,','4-':'4,','5-':'5,','6-':'6,','7-':'7,','8-':'8,','9-':'9,', ' ':',', ':':',' } txt = data_parser(my_text, reps) outputfile.writelines(txt) inputfile.close() outputfile.close()

3条回答

网友

1楼 · 编辑于 2024-04-26 01:40:27

有几种方法可以解决这个问题。一种选择是使用inputfile.read()而不是inputfile.readlines()-您需要编写单独的代码来删除前四行，但是如果您仍然希望最终输出为单个字符串，这可能最有意义。

第二个更简单的选项是在用my_text = ''.join(my_text)条带化前四行之后重新连接字符串。这有点低效，但是如果速度不是主要问题，代码将是最简单的。

最后，如果您真的希望输出为字符串列表而不是单个字符串，那么您可以修改数据解析器以在列表上迭代。可能是这样的：

def data_parser(lines, dic):
    for i, j in dic.iteritems():
        for (k, line) in enumerate(lines):
            lines[k] = line.replace(i, j)
    return lines

网友

2楼 · 编辑于 2024-04-26 01:40:27

从公认的答案来看，你想要的行为是

skip 0
skip 1
skip 2
skip 3
"2012-06-23 03:09:13.23",4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,"NAN",-0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636

进入

2012,06,23,03,09,13.23,4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,NAN,-0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636

如果这是对的，那么我想

import csv

with open("test.dat", "rb") as infile, open("test.csv", "wb") as outfile:
    reader = csv.reader(infile)
    writer = csv.writer(outfile, quoting=False)
    for i, line in enumerate(reader):
        if i < 4: continue
        date = line[0].split()
        day = date[0].split('-')
        time = date[1].split(':')
        newline = day + time + line[1:]
        writer.writerow(newline)

会比reps的东西简单一点。

网友

3楼 · 编辑于 2024-04-26 01:40:27

我将使用for循环遍历文本文件中的行：

for line in my_text:
    outputfile.writelines(data_parser(line, reps))

如果希望逐行读取文件，而不是在脚本开始时加载整个文件，可以执行以下操作：

inputfile = open('test.dat')
outputfile = open('test.csv', 'w')

# sample text string, just for demonstration to let you know how the data looks like
# my_text = '"2012-06-23 03:09:13.23",4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,"NAN",-0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636'

# dictionary definition 0-, 1- etc. are there to parse the date block delimited with dashes, and make sure the negative numbers are not effected
reps = {'"NAN"':'NAN', '"':'', '0-':'0,','1-':'1,','2-':'2,','3-':'3,','4-':'4,','5-':'5,','6-':'6,','7-':'7,','8-':'8,','9-':'9,', ' ':',', ':':',' }

for i in range(4): inputfile.next() # skip first four lines
for line in inputfile:
    outputfile.writelines(data_parser(line, reps))

inputfile.close()
outputfile.close()

相关问题更多 >

编程相关推荐

热门问题

热门文章