创建用于解除数据帧堆栈的函数

3条回答

网友

1楼 · 编辑于 2024-04-26 21:36:04

使用groupby、count和unstack：

res = df.groupby(['Year', 'Size', 'Month',]).InvoiceNo.count().unstack(0, fill_value=0)
res

Year        2014  2015
Size Month            
7.0  1         1     0
8.0  1         1     0
8.5  7         0     1
9.0  3         0     1
11.0 2         1     0

或者，等同于pivot_table：

res = df.pivot_table(index=['Size', 'Month'], 
                     columns='Year', 
                     values='InvoiceNo', 
                     aggfunc='count', 
                     fill_value=0)

Year        2014  2015
Size Month            
7.0  1         1     0
8.0  1         1     0
8.5  7         0     1
9.0  3         0     1
11.0 2         1     0

比较如下：

res[2014] > res[2015]

或者，只需计算所需年份：

(df[df.Year.eq(2014)]
     .groupby(['Size', 'Month'])
     .InvoiceNo
     .count()
     .unstack(1, fill_value=0))

Month  1  2
Size       
7.0    1  0
8.0    1  0
11.0   0  1

网友

2楼 · 编辑于 2024-04-26 21:36:04

df.apply将行或列作为Series对象传递-取决于指定的轴。它不会传递整个数据帧。你知道吗

如果要将函数应用于整个数据帧，那么df2014 = Year_calc(df)如何？你知道吗

您还应该考虑将年份作为参数传递给函数-这样就可以清楚地知道year\u calc函数在做什么。你知道吗

网友

3楼 · 编辑于 2024-04-26 21:36:04

以下是输入数据：

import pandas as pd

d = {'InvoiceNo':[1,2,3,4,5],'Month':[1,1,2,3,7],'Year':[2014,2014,2014,2015,2015],'Size':[7,8,11,9,8.5]}
df = pd.DataFrame(data = d)

解决方案1:

使用前面的答案和您给出的元素，下面是我设法编写的函数：

def Year_calc(data, year):

# grouping the by Size and month
t1 = data.loc[data.Year == year].groupby(['Size','Month'])

#count the number of Invoice for the given year
t2 = t1.InvoiceNo.count().unstack(0, fill_value=0)
return t2

以下是2014年的返回表：

Size   7.0   8.0   11.0
Month                  
1         1     1     0
2         0     0     1

解决方案2 由于删除了“年”作为参数，因此似乎最好进行一些调整，您可以在执行“分组依据”之前按年选择行，也可以按年、月、大小分组，然后选择与所需年份对应的行。你知道吗

def Year_calc(data):

    # grouping the by Year, Size and month
    t1 = data.groupby(['Year','Month','Size'])

    #count the number of Invoice for the given year
    t2 = t1.InvoiceNo.count().unstack(2, fill_value=0)
    return t2

未滤波输出为：

Size    7.0     8.0     8.5     9.0     11.0
Year    Month                   
2014    1   1   1   0   0   0
        2   0   0   0   0   1
2015    3   0   0   0   1   0
        7   0   0   1   0   0

假设您需要2015年的数据，然后键入：

tdf = Year_calc(data = df)
tdf.xs(2015) 
# or
test.loc[(2015,),:]

结果返回：

Size    7.0     8.0     8.5     9.0     11.0
Month                   
    3    0       0       0       1       0
    7    0       0       1       0       0

请检查本文中的多索引切片：here

希望这有帮助！你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章