优化Pandas中大文件的处理
有没有办法在处理大型或Excel文件时,让pandas更高效,不占用太多内存?
我现在的做法是这样加载文件的:
data = pd.read_csv('SUPERLARGEFILE.csv', index_col=0, encoding = "ISO-8859-1", low_memory=False)
Perform some task
data.to_csv('Results.csv', sep=',')
如果我在一台内存较小的电脑上工作,有没有办法可以逐步读取和处理大型数据文件,使用一个循环函数来做类似这样的事情:
Load first 1000 rows, store this in memory
Perform some task
Save data
Load next 1000 rows, over write this in memory
perform task
append to save file
1 个回答
1
只需要在你的代码里加上chunksize这个参数就可以了:
data = pd.read_csv('SUPERLARGEFILE.csv', index_col=0, encoding = "ISO-8859-1", low_memory=Fals, chunksize=10)
result = []
for chunk in data: # get chunks of 10 rows each
result.append(chunk.mean())
# do something with res e.g. res = DataFrame(res).to_csv("result.csv")