Python：如何将大文本文件读入内存问题的回答

Python：如何将大文本文件读入内存

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我使用的是MacMini上的Python2.6，内存为1GB。我想看一个巨大的文本文件 <pre><code>$ ls -l links.csv; file links.csv; tail links.csv -rw-r--r-- 1 user user 469904280 30 Nov 22:42 links.csv links.csv: ASCII text, with CRLF line terminators 4757187,59883 4757187,99822 4757187,66546 4757187,638452 4757187,4627959 4757187,312826 4757187,6143 4757187,6141 4757187,3081726 4757187,58197 </code></pre> 所以文件中的每一行由两个逗号分隔的整数值组成的元组。我想读入整个文件，并根据第二列对其进行排序。我知道，我可以在不把整个文件读入内存的情况下进行排序。但我认为对于一个500MB的文件，我应该仍然可以在内存中完成，因为我有1GB的可用空间。 然而，当我试图读入该文件时，Python似乎分配了比磁盘上的文件所需更多的内存。所以即使有1GB的内存，我也无法将500MB的文件读入内存。我的Python代码用于读取文件并打印有关内存消耗的一些信息，它是： <pre><code>#!/usr/bin/python # -*- coding: utf-8 -*- import sys infile=open("links.csv", "r") edges=[] count=0 #count the total number of lines in the file for line in infile: count=count+1 total=count print "Total number of lines: ",total infile.seek(0) count=0 for line in infile: edge=tuple(map(int,line.strip().split(","))) edges.<a href="https://www.cnpython.com/list/append" class="inner-link">append</a>(edge) count=count+1 # for every million lines print memory consumption if count%1000000==0: print "Position: ", edge print "Read ",float(count)/float(total)*100,"%." mem=sys.getsizeof(edges) for edge in edges: mem=mem+sys.getsizeof(edge) for node in edge: mem=mem+sys.getsizeof(node) print "Memory (Bytes): ", mem </code></pre> 我得到的结果是： <pre><code>Total number of lines: 30609720 Position: (9745, 2994) Read 3.26693612356 %. Memory (Bytes): 64348736 Position: (38857, 103574) Read 6.53387224712 %. Memory (Bytes): 128816320 Position: (83609, 63498) Read 9.80080837067 %. Memory (Bytes): 192553000 Position: (139692, 1078610) Read 13.0677444942 %. Memory (Bytes): 257873392 Position: (205067, 153705) Read 16.3346806178 %. Memory (Bytes): 320107588 Position: (283371, 253064) Read 19.6016167413 %. Memory (Bytes): 385448716 Position: (354601, 377328) Read 22.8685528649 %. Memory (Bytes): 448629828 Position: (441109, 3024112) Read 26.1354889885 %. Memory (Bytes): 512208580 </code></pre> 在只读取了500MB文件的25%之后，Python已经消耗了500MB。因此，将文件内容存储为int元组列表似乎不是很节省内存。有没有更好的方法来做到这一点，这样我就可以把我的500MB文件读入我的1GB内存？

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

Python：如何将大文本文件读入内存

1 个回答

相关Python问题