Pandas数据帧的子集，具有包含重复索引的索引 - 问答 - Python中文网

Pandas数据帧的子集，具有包含重复索引的索引

2024-04-23 06:45:21 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

对于数据帧：

df = pd.DataFrame({
    'key': [1,2,3,4,5, np.nan, np.nan],
    'value': ['one','two','three', 'four', 'five', 'six', 'seven']
}).set_index('key')

看起来是这样的：

        value
key     
1.0     one
2.0     two
3.0     three
4.0     four
5.0     five
NaN     six
NaN     seven

我想把它分为：

    value
key     
1   one
1   one
6   NaN

这将产生一个警告：

df.loc[[1,1,6],]

Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

这会产生一个错误：

df.reindex([1, 1, 6])

ValueError: cannot reindex from a duplicate axis

如何在引用缺少的索引而不使用apply时执行此操作？你知道吗

Tags： key df value np nan one loc three

1条回答

网友

1楼 · 发布于 2024-04-23 06:45:21

问题是有重复的值NaN作为索引。在重新编制索引时，您应该取消这些索引，因为它们是重复的，并且在新索引中使用哪个值存在歧义。你知道吗

df.loc[df.index.dropna()].reindex([1, 1, 6])

    value
key 
1   one
1   one
6   NaN

对于广义解，使用duplicated

df.loc[~df.index.duplicated(keep=False)].reindex([1, 1, 6])

如果要保留重复的索引并使用reindex，则会失败。这has actually been asked before几次

相关问题更多 >

编程相关推荐

热门问题

热门文章