如何有效地将NaN放入数据帧中？

df = {'Name':['Tom', 'nick', 'krish', 'jack'], 'Address':['Oxford', 'Cambridge', 'Xianjiang', 'Wuhan'], 'Age':[20, 21, 19, 18], 'Weight':[50, 61, 69, 78]} df = pd.DataFrame(df)

3条回答

网友

1楼 · 编辑于 2024-06-01 00:21:58

可以通过在元组的范围内取随机数，并在它们上面运行循环，并将其视为用NaAN

替换的索引。

例如：如果你有10个元组从随机数发电机组范围到0到9，以及并以上述运算结果为指标，用NaN代替

网友

2楼 · 编辑于 2024-06-01 00:21:58

我建议使用argpartition而不是argsort，因为执行的排序非常无用，因此将性能提高三倍于以前的答案（主要是受@jezrael启发）：

df1 = df.mask(np.random.rand(*df.shape).argpartition(0, axis=0) >= df.shape[0] // 2)
print(df1)
   Name    Address   Age  Weight
0   NaN     Oxford   NaN    50.0
1  nick  Cambridge  21.0    61.0
2   NaN        NaN   NaN     NaN
3  jack        NaN  18.0     NaN

性能比较

# Reusing the same comparison dataset
df = pd.concat([df] * 50000, ignore_index=True)
df = pd.concat([df] * 50, ignore_index=True, axis=1)


# @Andy's answer, using apply and sample
%timeit df.apply(lambda x: x.sample(frac=0.5)).reindex(df.index)
9.72 s ± 532 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# @jezrael's answer, based on mask, np random and argsort
%timeit df.mask(np.random.rand(*df.shape).argsort(axis=0) >= df.shape[0] // 2)
8.23 s ± 732 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# This answer, based on mask, np random and argpartition
%timeit df.mask(np.random.rand(*df.shape).argpartition(0, axis=0) >= df.shape[0] // 2)
2.54 s ± 98.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

网友

3楼 · 编辑于 2024-06-01 00:21:58

将apply与^{}一起使用

df_final =  df.apply(lambda x: x.sample(frac=0.5)).reindex(df.index)

Out[175]:
    Name    Address   Age  Weight
0    Tom        NaN   NaN    50.0
1    NaN        NaN   NaN    61.0
2  krish  Xianjiang  19.0     NaN
3    NaN      Wuhan  18.0     NaN

性能比较

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何有效地将NaN放入数据帧中？

性能比较

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >