计算时间序列中第一次二进制指示符的数目

Customer A B C D E F G H I J 11/30/2015 1 0 1 0 0 1 1 0 0 0 12/31/2015 0 1 0 1 0 1 1 0 0 1 1/31/2016 0 0 0 0 0 1 1 0 0 1 2/29/2016 1 1 1 1 1 1 0 1 1 1 3/31/2016 1 1 0 1 1 0 1 1 0 1 4/30/2016 0 1 1 1 0 1 1 1 0 1 5/31/2016 1 1 1 1 1 1 0 1 0 1

Customer A B C D E F G H I J New_Customers 11/30/2015 1 0 1 0 0 1 1 0 0 0 4 12/31/2015 0 1 0 1 0 1 1 0 0 1 3 1/31/2016 0 0 0 0 0 1 1 0 0 1 0 2/29/2016 1 1 1 1 1 1 0 1 1 1 3 3/31/2016 1 1 0 1 1 0 1 1 0 1 0 4/30/2016 0 1 1 1 0 1 1 1 0 1 0 5/31/2016 1 1 1 1 1 1 0 1 0 1 0

2条回答

网友

1楼 · 编辑于 2024-04-25 01:35:49

您可以应用一个累计最大值，该值可以转发每列的“1”，并对轴1中的所有列求和，然后计算差值。第一个值为null，可以通过对第一行求和来填充

df1 = df[df.columns.difference(['Customer'])]
df['New_customers'] = df1.cummax().sum(1).diff().fillna(df1.loc[0].sum())

输出：

Customer    A   B   C   D   E   F   G   H   I   J   New_customers
0   11/30/2015  1   0   1   0   0   1   1   0   0   0   4.0
1   12/31/2015  0   1   0   1   0   1   1   0   0   1   3.0
2   1/31/2016   0   0   0   0   0   1   1   0   0   1   0.0
3   2/29/2016   1   1   1   1   1   1   0   1   1   1   3.0
4   3/31/2016   1   1   0   1   1   0   1   1   0   1   0.0
5   4/30/2016   0   1   1   1   0   1   1   1   0   1   0.0
6   5/31/2016   1   1   1   1   1   1   0   1   0   1   0.0

网友

2楼 · 编辑于 2024-04-25 01:35:49

通过定义自定义new函数并使用DataFrame.expanding。我不知道为什么expanding().apply(new)的结果需要从float到int进行强制转换，但是，嘿，它是有效的：

def new(column):
    return column[-1] and not any(column[:-1])

result = df.expanding().apply(new).sum(axis=1).astype(int)

print(result)

Out:
11/30/2015    4
12/31/2015    3
1/31/2016     0
2/29/2016     3
3/31/2016     0
4/30/2016     0
5/31/2016     0
dtype: int32

相关问题更多 >

编程相关推荐

热门问题

热门文章