从一个datafram创建几个新的DataFrame或字典

2024-03-29 11:14:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样一个数据帧:

evt    pcle    bin_0    bin_1    bin_2    ...    bin_49
 1      pi      1        0         0               0 
 1      pi      1        0         0               0 
 1      k       0        0         0               1 
 1      pi      0        0         1               0 
 2      pi      0        0         1               0 
 2      k       0        1         0               0 
 3      J       0        1         0               0 
 3      pi      0        0         0               1 
 3      pi      1        0         0               0 
 3      k       0        1         0               0 
 ...
 5000   J       0        0         1               0 
 5000   pi      0        1         0               0 
 5000   k       0        0         0               1

有了这些信息,我想创建几个其他数据帧df_{evt}(或者字典应该更好?):

df_1 : 
pcle    cant    bin_0    bin_1    bin_2   ...    bin_49        
 pi      3        2        0        1              0
  k      1        0        0        0              1

df_2 : 
pcle    cant    bin_0    bin_1    bin_2   ...    bin_49        
 pi      1        0        0        1              0
  k      0        1        0        0              0

总共有5000个数据帧(每个evt 1个),其中每个数据帧中:

*the column "cant" has the ocurrences of "pcle" in the particular "evt". 

*bin_0 ... bin_49 have the sum of the values for this particular "pcle" in 
 the particular "evt".

实现这一目标的最佳方式是什么


Tags: ofthe数据in信息df字典bin
1条回答
网友
1楼 · 发布于 2024-03-29 11:14:38

下面是一个可能的解决方案:

import pandas as pd
import numpy as np
columns = ["evt", "pcle", "bin_0", "bin_1", "bin_2", "bin_3"]
data = [[1, "pi", 1, 0, 0, 0],
        [1, "pi", 0, 0, 0, 0],
        [1, "k", 0, 0, 0, 1],
        [1, "pi", 0, 0, 1, 0],
        [2, "pi", 0, 0, 1, 0],
        [2, "k", 0, 1, 0, 0],
        [3, "J", 0, 1, 0, 0],
        [3, "pi", 0, 0, 0, 1],
        [3, "pi", 1, 0, 0, 0],
        [3, "k", 0, 1, 0, 0]]

df = pd.DataFrame(data=data, columns=columns)

# group your data by the columns you want
grouped = df.groupby(["evt", "pcle"])

# compute the aggregates for the bin_X
df_t = grouped.aggregate(np.sum)

# move pcle from index to column
df_t.reset_index(level=["pcle"], inplace=True)

# count occurrences of pcle
df_t["cant"] = grouped.size().values

# filter evt with .loc
df_t.loc[1]

如果要将其编入词典,则可以运行:

d = {i:j.reset_index(drop=True) for i, j in df_t.groupby(df_t.index)}

相关问题 更多 >