Pandas根据基于其他列的条件添加值为的列

# method 1 df['is_rich_method1'] = np.where(df['salary']>=50, 'yes', 'no') # method 2 df['is_rich_method2'] = ['yes' if x >= 50 else 'no' for x in df['salary']] # method 3 df['is_rich_method3'] = 'no' df.loc[df['salary'] > 50,'is_rich_method3'] = 'yes'

1条回答

网友

1楼 · 发布于 2024-05-12 20:58:09

使用timeits，卢克！

结论列表理解在较小的数据量上表现最好，因为它们产生的开销很少，即使它们没有矢量化。在更大的数据上，loc和numpy.where表现得更好-矢量化赢得了胜利。

请记住，方法的适用性取决于数据、条件数和列的数据类型。我的建议是在确定一个选项之前，先对你的数据测试各种方法。

不过，这里有一个值得注意的地方，那就是列表理解非常有竞争力，它们是用C语言实现的，并且在性能上得到了高度优化。

Benchmarking code, for reference。以下是正在计时的函数：

def numpy_where(df):
  return df.assign(is_rich=np.where(df['salary'] >= 50, 'yes', 'no'))

def list_comp(df):
  return df.assign(is_rich=['yes' if x >= 50 else 'no' for x in df['salary']])

def loc(df):
  df = df.assign(is_rich='no')
  df.loc[df['salary'] > 50, 'is_rich'] = 'yes'
  return df

相关问题更多 >

编程相关推荐

热门问题

热门文章