添加满足条件的列,但在pandas python中保留以前的值

2024-05-29 11:45:23 发布

您现在位置:Python中文网/ 问答频道 /正文

如何避免在添加列时创建这么多变量?我有一些需要满足的条件,每一个新的陈述都会把不符合条件的旧信息洗掉。那么,我该如何保留旧的价值并加入新的价值呢?在

获取此数据帧

import pandas as pd
import datetime as DT

d = {'case' : pd.Series([1,1,1,1,2]),
  'open' : pd.Series([DT.datetime(2014, 3, 2), DT.datetime(2014, 3, 2),DT.datetime(2014, 3, 2),DT.datetime(2014, 3, 2),DT.datetime(2014, 3, 2)]),
'change' : pd.Series([DT.datetime(2014, 3, 8), DT.datetime(2014, 4, 8),DT.datetime(2014, 5, 8),DT.datetime(2014, 6, 8),DT.datetime(2014, 6, 8)]),
'StartEvent' : pd.Series(['Homeless','Homeless','Homeless','Homeless','Jail']),
'ChangeEvent' : pd.Series(['Homeless','Jail','Homeless','Jail','Jail']),
'close' : pd.Series([DT.datetime(2015, 3, 2), DT.datetime(2015, 3, 2),DT.datetime(2015, 3, 2),DT.datetime(2015, 3, 2),DT.datetime(2015, 3, 2)])}
df=pd.DataFrame(d)

这给了我一部分我需要的信息。在

^{pr2}$

理想情况下,我可以采取下一部分,让它降落在相同的变量'无家可归'监狱',但无论我尝试删除当前的条件不满足

df['homeless2']=(df['homeless']+(df['change']-df['open'])/np.timedelta64(1,'D'))[(df['ChangeEvent']=='Homeless') & (df['first']==1)]

例如,下一行将在不满足条件的地方输出。我如何保留旧的价值,并加入新的价值。在

#df['homeless2']=(df['homeless']+(df['change']-df['open'])/np.timedelta64(1,'D'))[(df['ChangeEvent']=='Homeless') & (df['first']==1)]

df['jail2']=(df['jail']+(df['change']-df['open'])/np.timedelta64(1,'D'))[(df['ChangeEvent']=='Jail') & (df['first']==1)]
df.homeless2=df.homeless2.fillna(0)
df.jail2=df.jail2.fillna(0)

df['homeless3']=(df['homeless']+(df['close']-df['change'])/np.timedelta64(1,'D'))[(df['ChangeEvent']=='Homeless') & (df['last']==1)]
df['jail3']=(df['jail']+(df['close']-df['change'])/np.timedelta64(1,'D'))[(df['ChangeEvent']=='Jail') & (df['last']==1)]
df.homeless3=df.homeless3.fillna(0)
df.jail3=df.jail3.fillna(0)

df['realjail']=df.jail+df.jail2+df.jail3
df['realhomeless']=df.homeless+df.homeless2+df.homeless3

这是可行的,但远没有效率。谢谢您。在


Tags: dfdatetimenpdtopenchangeseriespd
1条回答
网友
1楼 · 发布于 2024-05-29 11:45:23

你正在做的事情的第一部分;稍微清理干净

In [51]: df=pd.DataFrame(d)

In [52]: changes = df.groupby('case')['change']

In [53]: df['jail'] = (changes.diff()[df.ChangeEvent.shift(1)=='Jail']/np.timedelta64(1,'D'))

In [54]: df['homeless'] = (changes.diff()[df.ChangeEvent.shift(1)=='Homeless']/np.timedelta64(1,'D'))

In [55]: df['homeless'].fillna(0,inplace=True)

In [56]: df['jail'].fillna(0,inplace=True)

In [57]: df.loc[changes.idxmax(), 'last']=1

In [58]: df.loc[changes.idxmin(), 'first']=1

In [59]: df
Out[59]: 
  ChangeEvent StartEvent  case     change      close       open  jail  homeless  last  first
0    Homeless   Homeless     1 2014-03-08 2015-03-02 2014-03-02     0         0   NaN      1
1        Jail   Homeless     1 2014-04-08 2015-03-02 2014-03-02     0        31   NaN    NaN
2    Homeless   Homeless     1 2014-05-08 2015-03-02 2014-03-02    30         0   NaN    NaN
3        Jail   Homeless     1 2014-06-08 2015-03-02 2014-03-02     0        31     1    NaN
4        Jail       Jail     2 2014-06-08 2015-03-02 2014-03-02     0         0     1      1

[5 rows x 10 columns]

您不必创建这是新列,但IMHO有点干净

^{pr2}$

这是它告诉loc要设置哪些行的键

In [63]: homeless_mask = (df['ChangeEvent']=='Homeless') & (df['first']==1)

仅对指定的行掩码和列进行对齐

In [64]: df.loc[homeless_mask,'homeless'] = df['homeless_change']

In [65]: df
Out[65]: 
  ChangeEvent StartEvent  case     change      close       open  jail  homeless  last  first  homeless_change
0    Homeless   Homeless     1 2014-03-08 2015-03-02 2014-03-02     0         6   NaN      1                6
1        Jail   Homeless     1 2014-04-08 2015-03-02 2014-03-02     0        31   NaN    NaN               68
2    Homeless   Homeless     1 2014-05-08 2015-03-02 2014-03-02    30         0   NaN    NaN               67
3        Jail   Homeless     1 2014-06-08 2015-03-02 2014-03-02     0        31     1    NaN              129
4        Jail       Jail     2 2014-06-08 2015-03-02 2014-03-02     0         0     1      1               98

[5 rows x 11 columns]

相关问题 更多 >

    热门问题