简单的python脚本运行非常慢（csv文件）

hapinfile = file('file_with_header_columns', 'r') hapoutfile = file('file_missing_header_columns.csv', 'r') o = file('filescombined.txt', 'w') dictoutfile={} for line in hapoutfile: a=line.rstrip('\n').rstrip('\r').split('\t') dictoutfile[a[0]]=a[1:] hapinfile.close() for line in hapinfile: q=line.rstrip('\n').rstrip('\r').split('\t') g=q[0:11] for key, value in dictoutfile.items(): if g[0] == key: g.extend(value) o.write(str('\t'.join(g)+'\n')) hapoutfile.close() o.close()

3条回答

网友

1楼 · 编辑于 2024-04-19 19:23:20

from __future__ import with_statement   # if you need it

import csv 

with open('file_with_header_columns', 'r') as hapinfile,
         open('file_missing_header_columns', 'r') as hapoutfile,
         open('filescombined.txt', 'w') as outfile:
    good_data = csv.reader(hapoutfile, delimiter='\t')
    bad_data = csv.reader(hapinfile, delimiter='\t')
    out_data = csv.writer(outfile, delimiter='\t')
    for data_row in good_data:
        for header_row in bad_data:
            if header_row[0] == data_row[0]
                out_data.writerow(data_row)
                break   # stop looking through headers

您似乎有一个非常不幸的问题，您必须执行嵌套循环来查找数据。如果你能做一些事情，比如按标题字段对CSV文件进行排序，你可以获得更高的效率。事实上，利用CSV模块压缩所有内容。您可以使用break，虽然在for循环中有点奇怪，但一旦找到头文件，它至少会让您在第二个文件中“短路”。在

网友

2楼 · 编辑于 2024-04-19 19:23:20

由于嵌套的for循环一次又一次地在dict中无用功地跋涉，所以花费了很多时间。试试这个：

for line in hapinfile:
    q=line.rstrip('\n').rstrip('\r').split('\t')
    g=q[0:11]
    if g[0] in dictoutfile:
        g.extend( dictoutfile[g[0]] )
        o.write(str('\t'.join(g)+'\n'))

网友

3楼 · 编辑于 2024-04-19 19:23:20

首先，您不需要第二部分中的内部循环。这是一个你正在循环的字典，你应该用g[0]作为键来访问这个值。这将为您节省一个巨大的字典循环，该循环针对无头文件中的每一行。如果需要，可以检查g[0]是否在字典中，以避免键错误。在

相关问题更多 >

编程相关推荐

热门问题

热门文章