带有遮罩和变换的Groupby

2024-04-26 00:42:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样一个数据帧:

POLY_KEY_I      Class     SP_Percent             
FS01080100SM001 NA               5.0
                MTGP            67.5
                Meadow          25.0
                Woodland         2.5
FS01080100SM002 PHP             85.0
                SP              15.0

对于每个uniqe POLY_KEY_I,如果Class==Meadow,并且SP_Percent>;=20,我想把MTGP转换成{}。在

我想要的输出是:

^{pr2}$

我尝试的代码是:

df ['mask'] = ((df['Class'] == 'Meadow') & df['SP_Percent'] >=20)
mask = df.groupby(['POLY_KEY_I'])['mask'].transform('MTGP')
df.loc[mask,'Class']='WMTGP'
print(df)

但这将返回错误:

mask = final.groupby(['POLY_KEY_I'])['mask'].transform('MTGP')

File "C:\Users\Stefano\Anaconda2\lib\site-packages\pandas\core\groupby.py", line 2439, in transform return self._transform_fast(lambda : getattr(self, func)(*args, **kwargs))

File "C:\Users\Stefano\Anaconda2\lib\site-packages\pandas\core\groupby.py", line 2484, in _transform_fast values = func().values

File "C:\Users\Stefano\Anaconda2\lib\site-packages\pandas\core\groupby.py", line 2439, in return self._transform_fast(lambda : getattr(self, func)(*args, **kwargs))

File "C:\Users\Stefano\Anaconda2\lib\site-packages\pandas\core\groupby.py", line 520, in getattr (type(self).name, attr))

AttributeError: 'SeriesGroupBy' object has no attribute 'MTGP

编辑:

我不知道这是否有用,但如果我改变这一行:

mask = df.groupby(['POLY_KEY_I'])['mask'].transform('MTGP')

为此:

mask = df.groupby(['POLY_KEY_I'])['mask'].transform('any')

它将分别将POLY_KEY_ID的每个值更改为WMTGP,但我只希望它在MTGP时更改


Tags: keyselfdflibtransformmaskuserssp
2条回答

我是这样做的:

df ['mask'] = ((df['Class'] == 'Meadow') & (df['SP_Percent'] >=20))
df2 = df[(df['mask']==True)][['POLY_KEY_I']]
df2['mask2']=True
df = pd.merge(df,df2,how='left')
df.ix[((df['mask2']==True) & (df['Class']=='MTGP')),'Class'] = 'WMTGP'

我用apply自定义函数f将您的解决方案完全更改为groupby。对于检查字符串值,最好使用^{}。在

输入(增加第5行用于测试):

        POLY_KEY_I     Class  SP_Percent
0  FS01080100SM001       NaN         5.0
1  FS01080100SM001      MTGP        67.5
2  FS01080100SM001    Meadow        25.0
3  FS01080100SM001  Woodland         2.5
4  FS01080100SM002       PHP        85.0
5  FS01080100SM002      MTGP        85.0
6  FS01080100SM002        SP        15.0    
^{pr2}$

编辑1:

添加时间:

^{3}$

时间来源:

import pandas as pd
import numpy as np
import io

temp=u"""POLY_KEY_I;Class;SP_Percent
FS01080100SM001;NA;5.0
FS01080100SM001;MTGP;67.5
FS01080100SM001;Meadow;25.0
FS01080100SM001;Woodland;2.5
FS01080100SM002;PHP;85.0
FS01080100SM002;MTGP;85.0
FS01080100SM002;SP;15.0"""

df = pd.read_csv(io.StringIO(temp), sep=";", index_col=None, parse_dates=False)
print df
print df.dtypes
print df.index

def shahram(df):
    df ['mask'] = ((df['Class'] == 'Meadow') & (df['SP_Percent'] >=20))
    df2 = df[(df['mask']==True)][['POLY_KEY_I']]
    df2['mask2']=True
    df = pd.merge(df,df2,how='left')
    df.ix[((df['mask2']==True) & (df['Class']=='MTGP')),'Class'] = 'WMTGP'
    return df

def f(g):
    if ((g['Class'].isin(['Meadow'])) & (g['SP_Percent'] >=20)).any():
       g['Class'].loc[g['Class'].isin(['MTGP'])] = 'WMTGP'
       return g
    else:
       return g

print df.groupby(['POLY_KEY_I']).apply(f)
print shahram(df)

相关问题 更多 >