从日期列创建月份列(但是日期列不包含月份信息)

2024-04-20 15:01:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样一个数据,想创建一个名为'Month'的列

+---------+------------------+------+------+
| Name    | Task             | Team | Date |
+---------+------------------+------+------+
| John    | Market study     | A    | 1    |
+---------+------------------+------+------+
| Michael | Customer service | B    | 1    |
+---------+------------------+------+------+
| Joanna  | Accounting       | C    | 1    |
+---------+------------------+------+------+
| John    | Accounting       | B    | 2    |
+---------+------------------+------+------+
| Michael | Customer service | A    | 2    |
+---------+------------------+------+------+
| Joanna  | Market study     | C    | 2    |
+---------+------------------+------+------+
| John    | Customer service | C    | 1    |
+---------+------------------+------+------+
| Michael | Market study     | A    | 1    |
+---------+------------------+------+------+
| Joanna  | Customer service | B    | 1    |
+---------+------------------+------+------+
| John    | Market study     | A    | 2    |
+---------+------------------+------+------+
| Michael | Customer service | B    | 2    |
+---------+------------------+------+------+
| Joanna  | Accounting       | C    | 2    |
+---------+------------------+------+------+

所以基本上,我有日期信息,但是日期不包含它所属的月份。但是,我知道如果它第一次出现,那么它将属于第1个月,如果它第二次出现,那么它将属于第2个月。因此,例如,日期1出现3次,然后被日期2打断。因此,前3次属于第1个月,后3次出现,它属于第2个月。所以我希望我的结果如下:

+---------+------------------+------+------+---------+
| Name    | Task             | Team | Date | Month   |
+---------+------------------+------+------+---------+
| John    | Market study     | A    | 1    | Month 1 |
+---------+------------------+------+------+---------+
| Michael | Customer service | B    | 1    | Month 1 |
+---------+------------------+------+------+---------+
| Joanna  | Accounting       | C    | 1    | Month 1 |
+---------+------------------+------+------+---------+
| John    | Accounting       | B    | 2    | Month 1 |
+---------+------------------+------+------+---------+
| Michael | Customer service | A    | 2    | Month 1 |
+---------+------------------+------+------+---------+
| Joanna  | Market study     | C    | 2    | Month 1 |
+---------+------------------+------+------+---------+
| John    | Customer service | C    | 1    | Month 2 |
+---------+------------------+------+------+---------+
| Michael | Market study     | A    | 1    | Month 2 |
+---------+------------------+------+------+---------+
| Joanna  | Customer service | B    | 1    | Month 2 |
+---------+------------------+------+------+---------+
| John    | Market study     | A    | 2    | Month 2 |
+---------+------------------+------+------+---------+
| Michael | Customer service | B    | 2    | Month 2 |
+---------+------------------+------+------+---------+
| Joanna  | Accounting       | C    | 2    | Month 2 |
+---------+------------------+------+------+---------+

我不知道,除了使用一些循环。 谢谢大家


Tags: 数据name信息taskdateservicecustomerjohn
1条回答
网友
1楼 · 发布于 2024-04-20 15:01:34

如果我正确理解了这个问题,您可以执行以下操作:创建masks,将每个连续值分隔成单独的组。从s,为每个组的每个值创建掩码s1。按s1Date分组并执行cumcountmap以创建所需的输出:

s = df.Date.ne(df.Date.shift()).cumsum()
s1 = df.Date.groupby(s).cumcount()

df['Month'] = df.groupby([s1, 'Date']).Name.cumcount().add(1).map(lambda x: 'Month '+str(x))

Out[897]:
       Name              Task Team  Date    Month
0      John      Market-study    A     1  Month 1
1   Michael  Customer-service    B     1  Month 1
2    Joanna        Accounting    C     1  Month 1
3      John        Accounting    B     2  Month 1
4   Michael  Customer-service    A     2  Month 1
5    Joanna      Market-study    C     2  Month 1
6      John  Customer-service    C     1  Month 2
7   Michael      Market-study    A     1  Month 2
8    Joanna  Customer-service    B     1  Month 2
9      John      Market-study    A     2  Month 2
10  Michael  Customer-service    B     2  Month 2
11   Joanna        Accounting    C     2  Month 2

相关问题 更多 >