我有一个数据帧,我想将所有行分组在id
中,其中在带有country = russia and month = march
的行之后是带有country != russia
的行
输入数据帧:
data = {'fruit': ['pear','cucumber','cherry', 'apricot', 'pear','watermelon','pear','banana', 'pear', 'cherry','apple', 'melon', 'cherry','banana', 'kiwi', 'guava', 'banana'],
'country': ['france','russia', 'usa','russia', 'франция','russia','usa', 'russia', 'russia','ghana','russia', 'russia', 'albania','andorra', 'russia', 'russia', 'russia'],
'id': ['01','01','01','02','02','03','03','011', '011', '011','011', '6', '6','6', '5', '5', '5'],
'month': ['january','september','january','january', 'september','march','march', 'november', 'march', 'january','january', 'march', 'january','july', 'march', 'march', 'april']
}
df = pd.DataFrame(data, columns = ['fruit','country', 'id', 'month'])
我认为下面的方法应该有效,但它没有考虑month = march
,我得到了不正确的结果
有人看到问题了吗
df.groupby("id")
.filter(
lambda x: x.loc[(x["country"].eq("russia") & x["month"].eq("march")).idxmax() + 1:, ["country"]]
.fillna("russia")
.ne("russia")
.any()))
输出数据帧:
data = {'fruit': ['watermelon','pear','banana', 'pear', 'cherry','apple', 'melon', 'cherry','banana'],
'country': ['russia','usa', 'russia', 'russia','ghana','russia', 'russia', 'albania','andorra'],
'id': ['03','03','011', '011', '011','011', '6', '6','6'],
'month': ['march','march', 'november', 'march', 'january','january', 'march', 'january','july']
}
df = pd.DataFrame(data, columns = ['fruit','country', 'id', 'month'])
IIUC try:
out
:相关问题 更多 >
编程相关推荐