将我的数据帧迭代到groupby非常慢。有别的选择吗？

2024-06-01 00:40:31 发布

男 | 程序猿一只，喜欢编程写python代码。

我想在我的数据框中执行一个自定义的groupby。我有以下数据帧：

OwnerUserId  AnswerCount  CommentCount    Id             CreationDate
   3834.0          7.0               4    85  2009-06-28 11:31:29.417
      0.0          2.0               0   469  2009-06-29 07:46:13.990
  83871.0          3.0               2   918  2009-06-30 01:04:50.903
  77090.0          1.0               1  1094  2009-06-30 13:11:48.333
 130090.0          1.0               2  1208  2009-06-30 16:15:07.673
       ..          ..                ..    ..                      ..

对于每个'Id_q'（问题），我要按问题的'CreationDate'之前'OwnerUserId'所做的所有条目进行分组。为了做到这一点，我使用“CreationDate”进行排序，并对for循环中的每个条目执行groupby。代码如下所示。但是，我有4万行，这使得这个操作非常慢。有没有更快的方法？你知道吗

result_df = pd.DataFrame()
df = df.sort_values(["CreationDate"]) #sorting 
for i, row in df.iterrows():
    head_df = df.head(i)
    head_df = head_df[head_df.OwnerUserId == row.OwnerUserId]
    grouped_df = head_df.groupby('OwnerUserId', 
as_index=0).agg({'Id':"count",'CommentCount': "sum", 'AnswerCount': 'sum'})
    result_df = result_df.append(grouped_df)

我需要结果作为我的输出。你知道吗

Tags：数据 id df for 条目 result head row

0条回答

目前没有回答

将我的数据帧迭代到groupby非常慢。有别的选择吗？

相关问题更多 >

编程相关推荐

热门问题

热门文章

将我的数据帧迭代到groupby非常慢。有别的选择吗？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >