python numpy创建数据集列:仅根据条件添加值,否则为null

2024-04-29 13:00:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我尝试创建列终止日期。但只有当标志“取消”或“失效”设置为“是”时,列终止日期才应包含生效日期,否则为空。对于这三种方法,我收到以下错误消息

df['Termination_Date'] = np.where((df['Cancellations'] == 'Yes') | (df['Lapses'] == 'Yes'), df['Effective Date'])
ValueError: either both or neither of x and y should be given

df['Termination_Date'] = np.where((df['Cancellations'] == 'Yes') | (df['Lapses'] == 'Yes'), df['Effective Date'], "")
TypeError: invalid type promotion

df['Termination_Date'] = np.where((df['Cancellations'] == 'Yes') | (df['Lapses'] == 'Yes'), df['Effective Date'], np.nan)
TypeError: invalid type promotion

谢谢


Tags: dfdate标志typenpwhereyeseffective
3条回答

使用^{}

df['Termination_Date'] = df['Effective Date'].where( (df['Cancellations'] == 'Yes') |
                                                     (df['Lapses'] == 'Yes') )

^{}

df['Termination_Date'] = df['Effective Date'].mask( df['Cancellations'].ne('yes')
                                              .mul(df['Lapses'].ne('Yes') )

我们也可以用^{}检查

df['Termination_Date'] = df['Effective Date'].where( df[['Lapses','Cancellations']].eq('Yes').any(axis = 1) )

您可以使用.loc索引:

df = pd.DataFrame({'Effective_Date':pd.date_range('2019-01-01', periods = 6),
               'Cancellations':['Yes'] * 4 + ['No'] * 2,
               'Lapses':['Yes'] * 2 + ['No'] * 4})

df
    Effective_Date  Cancellations   Lapses
0   2019-01-01      Yes             Yes
1   2019-01-02      Yes             Yes
2   2019-01-03      Yes             No
3   2019-01-04      Yes             No
4   2019-01-05      No              No
5   2019-01-06      No              No

df["Termination_Date"] = df.loc[(df["Cancellations"] == "Yes") | (df["Lapses"] == "Yes"), "Effective_Date"]

    Effective_Date  Cancellations   Lapses  Termination_Date
0   2019-01-01      Yes             Yes     2019-01-01
1   2019-01-02      Yes             Yes     2019-01-02
2   2019-01-03      Yes             No      2019-01-03
3   2019-01-04      Yes             No      2019-01-04
4   2019-01-05      No              No      NaT
5   2019-01-06      No              No      NaT

是否可以与^{}一起使用替代方案

样本

df = pd.DataFrame({
         'Effective Date':pd.date_range('2019-01-01', periods=6),
         'Cancellations':['Yes'] * 4 + ['No'] * 2,
         'Lapses':['yes'] * 2 + ['No'] * 4,

})

df['Termination_Date'] = df['Effective Date'].where((df['Cancellations'] == 'Yes') | 
                                                     (df['Lapses'] == 'Yes')) 

或:

m = (df['Cancellations'] == 'Yes') | (df['Lapses'] == 'Yes')
df.loc[m, 'Termination_Date'] = df['Effective Date']

print (df)
  Effective Date Cancellations Lapses Termination_Date
0     2019-01-01           Yes    yes       2019-01-01
1     2019-01-02           Yes    yes       2019-01-02
2     2019-01-03           Yes     No       2019-01-03
3     2019-01-04           Yes     No       2019-01-04
4     2019-01-05            No     No              NaT
5     2019-01-06            No     No              NaT

相关问题 更多 >