数据透视中具有多个值的时间序列数据

2024-10-14 12:15:44 发布

您现在位置:Python中文网/ 问答频道 /正文

我有几个sns的URLpost,并且每天都在跟踪它的readCountlikeCountcommentCount等等

这是我使用pandas获得的数据帧:

     post_url       nickname    date_key  readCount  likeCount  commentCount
5       a_url          user1  2020-06-12     2874.0        194           NaN
4       a_url          user1  2020-06-13     4030.0        208          48.0
6       a_url          user1  2020-06-14        NaN        220          48.0
7       a_url          user1  2020-06-15        NaN        223          48.0
0       b_url          user2  2020-06-13    16882.0        295          88.0
2       b_url          user2  2020-06-14        NaN        296          88.0
3       b_url          user2  2020-06-15        NaN        299          88.0

我想要达到的结果是(请不要介意实际值,因为它们是即兴创作的):

      post_url      nickname          type  2020-06-12  2020-06-13    2020-06-14  2020-06-15
5        a_url         user1     readCount      2874.0      3074.0           NaN         NaN
4        a_url         user1     likeCount      4030.0      4334.0        4888.0      7463.0
6        a_url         user1  commentCount         NaN         220          48.0        59.0
1        b_url         user2     readCount         NaN         194           NaN         NaN 
3        b_url         user2     likeCount         NaN         208          88.0       493.0
2        b_url         user2  commentCount         NaN         220          53.0       292.0

请注意,每个post都有不同的date_key子集,我的目标是将所有现有的date_key合并到列中

关于这个主题,我已经尝试过搜索,但是没有找到完全相同的用例

你能给我一个实现这个目标的方法吗? 多谢各位


1条回答
网友
1楼 · 发布于 2024-10-14 12:15:44

使用^{}表示unpivot,然后使用^{}表示通过mean聚合的一般解决方案,如果可能,每列重复post_url,nickname,type,date_key

df = (df.melt(['post_url','nickname','date_key'], var_name='type')
        .dropna(subset=['value'])
        .pivot_table(index=['post_url','nickname','type'], 
                     columns='date_key', 
                     values='value', 
                     aggfunc='mean')
        .rename_axis(None, axis=1)
        .reset_index())
print (df)
  post_url nickname          type  2020-06-12  2020-06-13  2020-06-14  \
0    a_url    user1  commentCount         NaN        48.0        48.0   
1    a_url    user1     likeCount       194.0       208.0       220.0   
2    a_url    user1     readCount      2874.0      4030.0         NaN   
3    b_url    user2  commentCount         NaN        88.0        88.0   
4    b_url    user2     likeCount         NaN       295.0       296.0   
5    b_url    user2     readCount         NaN     16882.0         NaN   

   2020-06-15  
0        48.0  
1       223.0  
2         NaN  
3        88.0  
4       299.0  
5         NaN  

另一个想法是,如果不需要使用^{}^{}进行聚合:

df = (df.set_index(['post_url','nickname','date_key'])
        .stack()
        .unstack(2)
        .rename_axis(index=['post_url','nickname','type'], columns=None)
        .reset_index()
        )
print (df)
  post_url nickname          type  2020-06-12  2020-06-13  2020-06-14  \
0    a_url    user1     readCount      2874.0      4030.0         NaN   
1    a_url    user1     likeCount       194.0       208.0       220.0   
2    a_url    user1  commentCount         NaN        48.0        48.0   
3    b_url    user2     readCount         NaN     16882.0         NaN   
4    b_url    user2     likeCount         NaN       295.0       296.0   
5    b_url    user2  commentCount         NaN        88.0        88.0   

   2020-06-15  
0         NaN  
1       223.0  
2        48.0  
3         NaN  
4       299.0  
5        88.0  

相关问题 更多 >