如何使用None启动一个新列,并有条件地使用tuple更新其值?

2024-05-16 10:08:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下代码

import pandas as pd
d = [{'points': 50, 'time': '5:00', 'year': 2010},
     {'points': 25, 'time': '6:00', 'month': "february"},
     {'points': 90, 'time': '9:00', 'month': 'january'},
     {'points_h1': 20, 'month': 'june'}]
df = pd.DataFrame(d)
df['auditor'] = None
df.loc[df['points'] == 50, 'auditor'] = (1, 2)
print(df)
print(df.loc[df['points'] == 50, 'auditor'])

我想用None启动一个新列并有条件地用tuple更新其值,但出现以下错误

ValueError: cannot set using a multi-index selection indexer with a different length than the value

我渴望的结果是

      month  points  points_h1  time  year  auditor
0       NaN      50        NaN  5:00  2010  (1,2)
1  february      25        NaN  6:00   NaN  None
2   january      90        NaN  9:00   NaN  None
3      june     NaN         20   NaN   NaN  None

我可以怎样做


Tags: nonedftimeauditornanh1yearloc
2条回答

由于您无法确定条件只返回一行或多行,因此最好创建一系列元组,并根据条件返回的行数重复该元组:

condition = df['points'] == 50
df.loc[condition, 'auditor'] = pd.Series([(1, 2)]).repeat(condition.sum()).values

print(df)

   points  time    year     month  points_h1 auditor
0    50.0  5:00  2010.0       NaN        NaN  (1, 2)
1    25.0  6:00     NaN  february        NaN    None
2    90.0  9:00     NaN   january        NaN    None
3     NaN   NaN     NaN      june       20.0    None

为了了解我的意思,让我们考虑第二行也有{{CD1>}为50:

d = [{'points': 50, 'time': '5:00', 'year': 2010},
 {'points': 50, 'time': '6:00', 'month': "february"},
 {'points': 90, 'time': '9:00', 'month': 'january'},
 {'points_h1': 20, 'month': 'june'}]
df = pd.DataFrame(d)
df['auditor'] = None
print(df,'\n\n')

condition = df['points'] == 50
df.loc[condition, 'auditor'] = pd.Series([(1, 2)]).repeat(condition.sum()).values
print(df)

   points  time    year     month  points_h1 auditor
0    50.0  5:00  2010.0       NaN        NaN    None
1    50.0  6:00     NaN  february        NaN    None
2    90.0  9:00     NaN   january        NaN    None
3     NaN   NaN     NaN      june       20.0    None 


   points  time    year     month  points_h1 auditor
0    50.0  5:00  2010.0       NaN        NaN  (1, 2)
1    50.0  6:00     NaN  february        NaN  (1, 2)
2    90.0  9:00     NaN   january        NaN    None
3     NaN   NaN     NaN      june       20.0    None

您还可以使用np.where(),这是一个很好的条件函数:

df['auditor'] = np.where((df['points'] == 50), pd.Series([(1, 2)]), None)

或者在使用.assign()创建数据帧时,在一行中:

df = pd.DataFrame(d).assign(auditor=np.where((df['points'] == 50), pd.Series([(1, 2)]), None))

import pandas as pd, numpy as np
d = [{'points': 50, 'time': '5:00', 'year': 2010},
     {'points': 25, 'time': '6:00', 'month': "february"},
     {'points': 90, 'time': '9:00', 'month': 'january'},
     {'points_h1': 20, 'month': 'june'}]
df = pd.DataFrame(d).assign(auditor=np.where((df['points'] == 50), pd.Series([(1, 2)]), None))
df

Out[34]: 
   points  time    year     month  points_h1 auditor
0    50.0  5:00  2010.0       NaN        NaN  (1, 2)
1    25.0  6:00     NaN  february        NaN    None
2    90.0  9:00     NaN   january        NaN    None
3     NaN   NaN     NaN      june       20.0    None

根据您的评论,如果您想手动创建条件和结果,然后通过np.where()循环,那么您可以这样做:

import pandas as pd, numpy as np
d = [{'points': 50, 'time': '5:00', 'year': 2010},
     {'points': 25, 'time': '6:00', 'month': "february"},
     {'points': 90, 'time': '9:00', 'month': 'january'},
     {'points_h1': 20, 'month': 'june'}]
df = pd.DataFrame(d)

#Manually Set Conditions and Rsults
c1 = (df['points'] == 50)
r1 =  pd.Series([(1, 2)])
c2 = (df['points'] == 25)
r2 = pd.Series([(1, 3)])
conditions = [c1,c2]
results = [r1,r2]

df['auditor'] = None
for c, r in zip(conditions, results):
    df['auditor'] = np.where(c, r, df['auditor'])
df

Out[39]: 
   points  time    year     month  points_h1 auditor
0    50.0  5:00  2010.0       NaN        NaN  (1, 2)
1    25.0  6:00     NaN  february        NaN  (1, 3)
2    90.0  9:00     NaN   january        NaN    None

见Anky的评论。而不是:

df['auditor'] = None
    for c, r in zip(conditions, results):
        df['auditor'] = np.where(c, r, df['auditor'])

您可以使用np.select来避免循环。这是一个更像Python的游戏。做到这一点的有效方法:

df['auditor'] = np.select(conditions,results,None)

相关问题 更多 >