用MultiIndex是firs的子集的数据帧更新MultiIndexDataFrame

2024-04-24 17:05:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个panda数据帧xy,每个都有一个多索引。y的多重索引是x的子集。我想使用y的值更新x中的字段:

x.index.names
Out[]: FrozenList(['cohort', 'id', 'design', 'date'])

y.index.names
Out[]: FrozenList(['cohort', 'id'])

我能做到吗?你知道吗


示例:

import pandas as pd

数据帧x

# sets of different measurements on different subjects on different
# dates.

df = pd.read_pickle('protocol.pkl')

df.set_index(
        keys=['cohort', 'id', 'design', 'date'],
        inplace=True,
        verify_integrity=True,
        drop=True)

df.head()
Out[]:
                             valid  epi
cohort id design date
FOOBAR 1  FOO    2014-04-22   True    3
       2  BAR    2014-04-24   True    3
       2  BAR    2014-04-25   True    3
       4  FOO    2014-04-25   True    3
       4  BAR    2014-05-05   True    3

df.shape
Out[]: (714, 2)

数据帧y

# subjects to exclude from the study

up = pd.read_pickle('outlying.pkl')

up.set_index(keys=['cohort', 'id', 'design'],
        inplace=True,
        verify_integrity=True,
        drop=True)

up.head()
Out[]:
                     valid
cohort id  design
FOOBAR 1   BAR       False
       2   BAR       False
       12  BAR       False
       22  FOO       False
       28  FOO       False

up.head()
Out[]: (14, 1)

更新的结果应该是:

df.head()
Out[]:
                             valid  epi
cohort id design date
FOOBAR 1  FOO    2014-04-22   True    3
       2  BAR    2014-04-24   False   3
       2  BAR    2014-04-25   False   3
       4  FOO    2014-04-25   True    3
       4  BAR    2014-05-05   True    3

我希望

df.update(up)

这样做,因为up的索引是df的“子集”,但它对df没有影响。你知道吗


Tags: 数据idfalsetruedfdateindexfoo
1条回答
网友
1楼 · 发布于 2024-04-24 17:05:27

我尝试用Multiindex重新索引,但得到:

TypeError: Join on level between two MultiIndex objects is ambiguous

所以可能的解决方案是^{}^{}^{}替换NaN

df = (df.reset_index().join(up, on=['cohort','id','design'], lsuffix='_')
        .assign(valid=lambda x: x.valid.fillna(x.valid_))
        .drop('valid_', axis=1)
        .set_index(['cohort','id','design', 'date'])
       )
print (df)
                             epi  valid
cohort id design date                  
FOOBAR 1  FOO    2014-04-22    3   True
       2  BAR    2014-04-24    3  False
                 2014-04-25    3  False
       4  FOO    2014-04-25    3   True
          BAR    2014-05-05    3   True

相关问题 更多 >