内存效率高的Python（pandas）从每个时段一个csv文件聚合类别

1条回答

网友

1楼 · 发布于 2024-06-09 21:56:14

您的代码相当重复，可以通过字典和列表理解来简化。这个解决方案应该可以消除你的内存问题，因为你一次只处理一个月的数据（尽管你的每月摘要列表越来越多，我不相信会占用太多内存）。在

但我不能用上面的代码来测试。在

import pandas as pd
import iopro

items = {'neuro': 'N', 
         'cardio': 'C', 
         'cancer': 'L', 
         'addiction': 'N07', 
         'Adrugs': 'A', 
         'Mdrugs': 'M', 
         'Vdrugs': 'V', 
         'all_drugs': ''}

# 1. Create data container using dictionary comprehension.
monthly_summaries = {item: list() for item in items.keys()}

# 2. Perform monthly groupby operations.
for year in xrange(2005, 2013):
    for month in xrange(1, 13):
        if year == 2005 and month < 7:
            continue
        filename = 'PATH/lmed_' + str(year) + '_mon'+ str(month) +'.txt'
        adapter = iopro.text_adapter(filename,
                                     parser='csv', 
                                     field_names=True, 
                                     output='data frame', 
                                     delimiter='\t')
        monthly = adapter[['LopNr','ATC','TKOST']][:]
        monthly['year'] = year
        monthly['month'] = month
        dfs = {name: monthly[(monthly.ATC.str.startswith('{0}'.format(code))) 
                             & (~(monthly.TKOST.isnull()))]
                     for name, code in items.iteritems()}
        [monthly_summaries[name].append(dfs[name].groupby(['LopNr','year','month']).sum()
                                        .astype(int, copy=False)) 
         for name in items.keys()]

# 3. Now concatenate all of the monthly summaries into separate DataFrames.
dfs = {name: pd.concat([monthly_summaries[name], ignore_axis=True]) 
       for name in items.keys()}

# 4. Now regroup the aggregate monthly summaries.
monthly_summaries = {name: dfs[name].reset_index().groupby(['LopNr','year','month']).sum()
                    for name in items.keys()}

# 5. Finally, save the aggregated results to files.
[monthly_summaries[name].to_csv('PATH/monthly_{0}_costs.csv'.format(name))
 for name in items()]

相关问题更多 >

编程相关推荐

热门问题

热门文章

内存效率高的Python（pandas）从每个时段一个csv文件聚合类别

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >