Python中大Pandas合并的MemoryError

网友

1楼 · 编辑于 2024-04-18 17:46:36

{我认为使用外部连接可以获得更好的性能：

dfs = (pd.read_csv(filename).set_index('id') for filename in filenames)
merged_df = pd.concat(dfs, axis=1)

这意味着您只对每个文件执行一个合并操作，而不是执行一个合并操作。

网友

2楼 · 编辑于 2024-04-18 17:46:36

我在使用带有1GB文件的read_csv时在32位pyt中遇到了相同的错误。尝试64位版本，希望能解决内存错误问题

网友

3楼 · 编辑于 2024-04-18 17:46:36

pd.concat对于大数据帧似乎内存不足，一种选择是将dfs转换为矩阵并将其合并。在

def concat_df_by_np(df1,df2):
    """
    accepts two dataframes, converts each to a matrix, concats them horizontally and
    uses the index of the first dataframe. This is not a concat by index but simply by
    position, therefore the index of both dataframes should be the same
    """
    dfout = deepcopy(pd.DataFrame(np.concatenate( (df1.as_matrix(),df2.as_matrix()),axis=1),
                                  index   = df1.index, 
                                  columns = np.concatenate([df1.columns,df2.columns])))
    if (df1.index!=df2.index).any():
       #logging.warning('Indices in concat_df_by_np are not the same')                     
       print ('Indices in concat_df_by_np are not the same')                     


    return dfout

但是，需要小心，因为这个函数不是一个连接，而是一个水平附加，而索引被忽略

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python中大Pandas合并的MemoryError

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >