基于多准则的计算与划分

2024-06-11 02:03:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我有个问题: 我有数据(约30万行)

Date    Column 1    Column 2    Column 3    Value
1/1/2017    A       Jonas       Station1    8
1/1/2017    A       Greg        Station1    5
1/1/2017    A       Anton       Station1    1
1/1/2017    B       Jonas       Station1    4
1/1/2017    B       Greg        Station1    4
1/1/2017    B       Mick        Station2    8
1/1/2017    B       Anton       Station3    7
1/1/2017    C       Jonas       Station4    2
1/1/2017    C       Greg        Station1    7

我需要计算下一步: 按日期,按第2栏,按第3栏: 我有个例子“格雷格”:

Date    Column 1    Column 2    Column 3    Value
1/1/2017    A       Greg        Station1    5
1/1/2017    B       Greg        Station1    4
1/1/2017    C       Greg        Station1    7

对所有C/计数(A,B)求和,对于本例7/2=3.5 为每行添加值“3.5”。你知道吗

Date    Column 1    Column 2    Column 3    Value
1/1/2017    A       Greg        Station1    5+3.5 = 8.5
1/1/2017    B       Greg        Station1    4+3.5 = 7.5

熊猫/Python是如何制作的? 更新:像“Greg”这样的物品我有~100天~500天和第1列、第2列~1k


Tags: 数据datevaluecolumn例子计数antongreg
1条回答
网友
1楼 · 发布于 2024-06-11 02:03:11

用途:

#filter values A, B
df1 = df[df['Column 1'].isin(['A','B'])]
#get count per groups, reindex by original index for original size
b = (df1.groupby(['Date','Column 2', 'Column 3'])['Column 1']
       .transform('size')
       .reindex(df.index))

#filter value C and rename for merge
df2 =  df.loc[df['Column 1'] == 'C', ['Date','Column 2','Column 3','Value']]
         .rename(columns={'Value':'a'})

#merge to original a select only new column a
a = pd.merge(df, df2, 'left')['a']
#divide and add column Value - if divide return NaN let original values
df['Value'] = a.div(b).add(df['Value'], fill_value=0)
print (df)
       Date Column 1 Column 2  Column 3  Value
0  1/1/2017        A    Jonas  Station1    8.0
1  1/1/2017        A     Greg  Station1    8.5
2  1/1/2017        A    Anton  Station1    1.0
3  1/1/2017        B    Jonas  Station1    4.0
4  1/1/2017        B     Greg  Station1    7.5
5  1/1/2017        B     Mick  Station2    8.0
6  1/1/2017        B    Anton  Station3    7.0
7  1/1/2017        C    Jonas  Station4    2.0
8  1/1/2017        C     Greg  Station1    7.0

使用自定义函数应更慢:

def f(x):
    a = x.loc[x['Column 1'] == 'C', 'Value']
    #mask of all A, B values per group
    m = x['Column 1'].isin(['A','B'])
    c = a / m.sum()
    #if-else, because if no C exist per group get empty Series
    c = 0 if c.empty else c.item() 
    x.loc[b, 'Value'] +=c
    return x


df = df.groupby(['Date','Column 2', 'Column 3']).apply(f)
print (df)
       Date Column 1 Column 2  Column 3  Value
0  1/1/2017        A    Jonas  Station1    8.0
1  1/1/2017        A     Greg  Station1    8.5
2  1/1/2017        A    Anton  Station1    1.0
3  1/1/2017        B    Jonas  Station1    4.0
4  1/1/2017        B     Greg  Station1    7.5
5  1/1/2017        B     Mick  Station2    8.0
6  1/1/2017        B    Anton  Station3    7.0
7  1/1/2017        C    Jonas  Station4    2.0
8  1/1/2017        C     Greg  Station1    7.0

相关问题 更多 >