对于每一天，获取一个非常大的数据帧中两个特定列中匹配的所有行的总和

def analysis(d, t): combinations_df = d.loc[d['day'] == t] index = [] for idx, row in combinations_df.iterrows(): idd = combinations_df[combinations_df['reversed'] == row['pair']].index if len(idd) != 0: index.append(idd[0]) else: index.append(-1) combinations_df['reversed_idx'] = index skippy = [] to_drop = [] def add_occurences(row): if row['reversed_idx'] == -1 or row['reversed_idx'] in skippy: return row else: row['amount'] += combinations_df.loc[row['reversed_idx']]['amount'] skippy.append(row.name) to_drop.append(row['reversed_idx']) return row res = combinations_df.apply(lambda x: add_occurences(x), axis=1) skippy = set(skippy) to_drop = list(set(to_drop)) return res.drop(to_drop)[['day', 'amount', 'pair']]

2条回答

网友

1楼 · 编辑于 2024-06-16 08:58:18

请考虑将一些示例添加为代码而不是^ {< CD1>}，因为这将使您的代码更容易使用。p>

您可以做的是groupby对，然后聚集amount的摘要

如果上表为df，您可以执行以下操作：

>>> df = {'day': [226, 226, 226, 226, 226],
 'amount': [5, 17, 1604, 127, 1558],
 'pair': ['(B2141043,B2161043)',
  '(B2141043,B2161043)',
  '(B2141043,B2161043)',
  '(B2141043,C22D1043)',
  '(B2141043,B2161043)'],
 'reversed': ['(B2161043,B2141043)',
  '(B2161043,B2141043)',
  '(B2161043,B2141043)',
  '(C22D1043,B2141043)',
  '(B2161043,B2141043)']}

>>> df.groupby('pair').agg({'day' : 'first','amount': 'sum'})

                     day  amount
pair
(B2141043,B2161043)  226    3184
(B2141043,C22D1043)  226     127

网友

2楼 · 编辑于 2024-06-16 08:58:18

与前面使用groupby和agg的响应类似，但在唯一键组合上求和：

result = my_df.groupby(['day', my_df.pair.apply(set).apply(tuple)])[['amount']].agg('sum').reset_index()

对于一个5000长度的随机数据帧，使用您的函数在几天内进行循环对我来说需要4.38秒±204毫秒，现在，我是9.86毫秒±185微秒

相关问题更多 >

编程相关推荐

热门问题

热门文章