将dataframe列除以特定单元格

2024-05-16 05:02:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我想将数据帧列除以同一数据帧中的特定单元格

我有这样一个数据帧:

 date      type          score   
 20201101  experiment1   30      
 20201101  experiment2   20      
 20201101  baseline      10      
 20201102  experiment1   60      
 20201102  experiment2   50      
 20201102  baseline      10      

我想通过将分数除以该日期的“基线”分数来计算分数比

 date      type          score   score_ratio
 20201101  experiment1   30      3
 20201101  experiment2   20      2
 20201101  baseline      10      1
 20201102  experiment1   60      6
 20201102  experiment2   50      5
 20201102  baseline      10      1

(date, type) = (20201101, experiment1)的得分率应通过将其得分除以(20201101, baseline)的得分来获得。在这种情况下,它应该是30/10=3。同样地。对于(20201101, experiment2),我们应该用同样的东西来除以分数,(20201101, baseline)。对于不同的日期,例如(20201102, experiment1),它应该除以该日期的基线(20201102, baseline)

如何使用数据帧操作添加此列

到目前为止,我有这个,但不确定应该除以什么表达式:df['score_ratio'] = df['score'].div(...)

编辑:

我得到了最后一行的错误 ValueError:值的长度与索引的长度不匹配

     ID    date        type          room    score         
0  id1     20201120    baseline      1     450.25    
0  id2     20201120    experiment1   1     -3637.24   
0  id3     20201121    baseline      1     200.00   
1  id4     20201121    experiment1   1     300.00   
2  id5     20201120    baseline      2     600.00   
3  id6     20201120    experiment1   2     800.00   


    _df = df.set_index('date', 'room')
    d = _df.query('type=="baseline"')
    print(_df['score'].div(d['score']).values)
    df['score_ratio'] = _df['score'].div(d['score']).values
            

Tags: 数据divdfdatetype分数roomscore
3条回答

date列设置为索引,然后过滤出type所在的位置baseline,然后使用^{}

_df = df.set_index('date')
d = _df.query('type=="baseline"') # same as _df.loc[_df['type'].eq('baseline')]
df['score_ratio'] = _df['score'].div(d['score']).values
df
       date         type  score  score_ratio
0  20201101  experiment1     30          3.0
1  20201101  experiment2     20          2.0
2  20201101     baseline     10          1.0
3  20201102  experiment1     60          6.0
4  20201102  experiment2     50          5.0
5  20201102     baseline     10          1.0
 #Mask all whose type is baseline into a new datframe and merge to the main df
 g=pd.merge(df, df[df.type.eq('baseline')].drop(columns='type'),how='left', on='date', suffixes=('', '_right'))

 #Calculate the score_ratio and drop the extra column acquired during merge
    df=g.assign(score_ratio=g.score.div(g.score_right).astype(int)).drop(columns=['score_right'])

print(df)

     date         type  score    score_ratio
0  20201101  experiment1     30            3
1  20201101  experiment2     20            2
2  20201101     baseline     10            1
3  20201102  experiment1     60            6
4  20201102  experiment2     50            5
5  20201102     baseline     10            1

它的工作原理

#New dataframe with baselines only
df1=df[df.type.eq('baseline')].drop(columns='type')

#Modified original dataframe with baselines added
g=pd.merge(df, df1,how='left', on='date', suffixes=('', '_right'))

#new column called score_ratio
g=g.assign(score_ratio=g.score.div(g.score_right).astype(int))

#drop column called score_right which was acquired during merge
g=g.drop(columns=['score_right'])

只需3行简单的代码和注释:

# set index to ['date', 'type']
df.set_index(['date', 'type'], inplace=True)

# helper: values of score at index 'baseline'
s = df.xs('baseline', level=1)  

# divide df by series and reset index
df.div(s, level=0).reset_index()


      date         type  score
0  20201101  experiment1    3.0
1  20201101  experiment2    2.0
2  20201101     baseline    1.0
3  20201102  experiment1    6.0
4  20201102  experiment2    5.0
5  20201102     baseline    1.0

相关问题 更多 >