带条件优先操作的DF序列的算法被覆盖

2024-06-02 07:34:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在搜集一些工资数据,需要根据另一列将其转换为小时工资或年薪。我研究过如何做到这一点-这可能不是最有效的-但它适用于一条生产线。你知道吗

数据

import pandas as pd, numpy as np

columns = ['Location','Hourly','Annually','Monthly','Daily','Average','Hourly_Rate','Annual_Rate']
df = pd.DataFrame(columns=columns)
df.loc[1] = ['A',True,False,False,False,10.10,np.nan,np.nan]
df.loc[2] = ['B',False,True,False,False,50000,np.nan,np.nan]

df['Annual_Rate'] = (df['Average'] * 2080).where(df['Hourly'] == True) #need this line to run and not get overwritten
df['Annual_Rate'] = df['Average'].where(df['Annually'] == True ) #overwrites prior line
df['Annual_Rate'] = df['Average'].where(df['Annually'] == True & pd.isna(df['Annual_Rate'])) #overwrites prior line and is incorrect

df['Hourly_Rate'] = (df['Average'] / 2080).where([(df['Annually'] == True) & (pd.isnull(df['Hourly_Rate']))])
df['Hourly_Rate'] = df['Average'].where(df['Hourly'] == True & (pd.isna(df['Hourly_Rate'])))
df['Hourly_Rate'] = df['Average'].where(df['Hourly'] == True)
df.head(10)

以下是我/我需要工作的线路:

df['Hourly_Rate'] = (df['Average'] / 2080).where([(df['Annually'] == True) & (pd.isnull(df['Hourly_Rate']))])
df['Annual_Rate'] = (df['Average'] * 2080).where(df['Hourly'] == True)

预期结果:

+---+----------+--------+----------+---------+-------+---------+-------------+-------------+
|   | Location | Hourly | Annually | Monthly | Daily | Average | Hourly_Rate | Annual_Rate |
+---+----------+--------+----------+---------+-------+---------+-------------+-------------+
| 1 | A        | TRUE   | FALSE    | FALSE   | FALSE |    10.1 |        10.1 |       21008 |
| 2 | B        | FALSE  | TRUE     | FALSE   | FALSE |   50000 | 24.03846154 |       50000 |
+---+----------+--------+----------+---------+-------+---------+-------------+-------------+

提前谢谢。你知道吗


Tags: columnsfalsetruedfratenplinenan
1条回答
网友
1楼 · 发布于 2024-06-02 07:34:34

pd.Series.wherenumpy.where的工作原理不同。后者可用于指定向量化的if-else条件,可能是您需要的:

df['Annual_Rate'] = np.where(df['Hourly'], df['Average'] * 2080, df['Average'])

df['Hourly_Rate'] = np.where(df['Annually'] & df['Hourly_Rate'].isnull(),
                             df['Average'] / 2080, df['Average'])

pd.Series.where用给定的值更新一个序列,其中条件不满足,否则保持不变(在这种情况下,NaN当未指定时),如docs中所述:

Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.

另外请注意,您可以直接使用布尔级数,而不是测试df[col] == True。你知道吗

相关问题 更多 >