如何有条件地将Python数据帧复制到单元格中部分命名的列?

2024-04-28 16:33:23 发布

您现在位置:Python中文网/ 问答频道 /正文

如何选择行以根据另一列的内容更改一列的值,并使用选定行的单元格中的值来选择要填充的列

我是Python新手,不知道如何用Pythonesque的方式做事。我从一个数据源中得到了一些列,我正试图将它们转换成不同的格式

例如,在下面的行中,ProvincePolled=='Ontario',我想将CandidateA列内容复制到'Ontario CandidateA',将CandidateB列内容复制到'Ontario CandidateB'

对于BC和魁北克行,我同样希望将CandidateA和CandidateB列的值复制到通过连接ProvincePolled单元格和这些列的名称而命名的列

最后,ProvincePolled=='Canada'的行需要将CandidateY列结果复制到相应的'ProvinceX CandidateY'列(其中ProvinceX in(安大略省、不列颠哥伦比亚省、魁北克省)和Y in('A','B')

  df = pd.DataFrame({'ProvincePolled':['Ontario','Ontario','BC','Quebec','Canada'],
                  'CandidateA':[33.1,31.3,27.7,33.3,30.0],
                  'CandidateB':[12.1,15.3,28.7,11.3,18.0],
                  'Ontario CandidateA':[0.0,0,0,0,0],
                  'Ontario CandidateB':[0.,0,0,0,0],
                  'BC CandidateA':[0.,0,0,0,0],
                  'BC CandidateB':[0.,0,0,0,0],
                  'Quebec CandidateA':[0.,0,0,0,0],
                  'Quebec CandidateB':[0.,0,0,0,0],
                  })
df

抱歉,这里的格式有问题:

ProvincePolled  CandidateA  CandidateB  Ontario CandidateA  Ontario CandidateB  BC CandidateA   BC CandidateB   Quebec CandidateA   Quebec CandidateB
0   Ontario 33.1    12.1    0.0 0.0 0.0 0.0 0.0 0.0
1   Ontario 31.3    15.3    0.0 0.0 0.0 0.0 0.0 0.0
2   BC  27.7    28.7    0.0 0.0 0.0 0.0 0.0 0.0
3   Quebec  33.3    11.3    0.0 0.0 0.0 0.0 0.0 0.0
4   Canada  30.0    18.0    0.0 0.0 0.0 0.0 0.0 0.0

以下语句不能正确确定“省”列:

df.loc[df['ProvincePolled'] != 'Canada', df['ProvincePolled'] + ' CandidateA'] = df.loc[df['ProvincePolled'] != 'Canada', 'CandidateA']

因为它会导致KeyError:“['Canada CandidateA']not in index”

我还试着定义一个函数

def fill_cols(row,cols,from_col):
  for col in cols:
    row[col] = from_col

df.loc[df['ProvincePolled'] != 'Canada'] = df.locdf['ProvincePolled'] != 'Canada'].apply(lambda x: fill_cols(x,['Ontario CandidateA','Quebec CandidateA','BC CandidateA'],x['CandidateA'])

但这也不起作用,产生了KeyError:('CandidateA','occurred at index ProvincePolled')


Tags: in内容df格式colloccolsbc
1条回答
网友
1楼 · 发布于 2024-04-28 16:33:23

IIUC,这只是一个简单的pivotupdate和切片赋值

df1 = df[['ProvincePolled', 'CandidateA', 'CandidateB']]
df2 = df1.pivot(columns='ProvincePolled')
df2.columns = df2.columns.map('{0[1]} {0[0]}'.format)
df.update(df2)
df.loc[df.ProvincePolled.eq('Canada'),
       df.columns.str.contains('\w+ +CandidateA')] = df.loc[df.ProvincePolled.eq('Canada'), 'CandidateA']
df.loc[df.ProvincePolled.eq('Canada'),
       df.columns.str.contains('\w+ +CandidateB')] = df.loc[df.ProvincePolled.eq('Canada'), 'CandidateB']

Out[173]:
  ProvincePolled  CandidateA  CandidateB  Ontario CandidateA  \
0        Ontario        33.1        12.1                33.1
1        Ontario        31.3        15.3                31.3
2             BC        27.7        28.7                 0.0
3         Quebec        33.3        11.3                 0.0
4         Canada        30.0        18.0                30.0

   Ontario CandidateB  BC CandidateA  BC CandidateB  Quebec CandidateA  \
0                12.1            0.0            0.0                0.0
1                15.3            0.0            0.0                0.0
2                 0.0           27.7           28.7                0.0
3                 0.0            0.0            0.0               33.3
4                18.0           30.0           18.0               30.0

   Quebec CandidateB
0                0.0
1                0.0
2                0.0
3               11.3
4               18.0

相关问题 更多 >