正确的数据帧切片修改

2024-04-20 08:02:44 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从一组行中修改一组列,当然我得到了以下警告:

A value is trying to be set on a copy of a slice from a DataFrame

我看到了一个类似的问题here,但我还是绕不过去。你知道吗

因此,如果我们遵循以下示例代码:

from random import random as rd
ex= pd.DataFrame([{"group": ["a","b"][int(round(rd()))], "colA": rd()*10, "colB": rd()*10, "colC": rd()*10,  "colD": rd()*10} for _ in range(20)])
cols = [col for col in ex.columns if col != "group"]

我只想修改属于group a的行和只属于cols列的行,对于这些行,我可以直观地尝试一下(并得到警告):

ex[ex["group"]=="a"][cols] = ex[ex["group"]=="a"][cols]/ex.ix[0,cols]

列的数量匹配并且具有相同的标签,所以我想知道是否需要一个接一个地进行如下操作:

for idx in ex[ex["group"]=="a"].index:
    for col in cols:
        ex.ix[idx, col]=ex.ix[idx, col]/ex.ix[0,col]

这当然管用,但有点像退步。那么,这样做的正确方法是什么呢?你知道吗


Tags: infrom警告dataframeforisvaluegroup
1条回答
网友
1楼 · 发布于 2024-04-20 08:02:44

IIUC您可以使用.loc布尔条件一步完成,并传递列列表:

In [110]:
from random import random as rd
ex= pd.DataFrame([{"group": ["a","b"][int(round(rd()))], "colA": rd()*10, "colB": rd()*10, "colC": rd()*10,  "colD": rd()*10} for _ in range(20)])
cols = [col for col in ex.columns if col != "group"]
ex

Out[110]:
        colA      colB      colC      colD group
0   5.895114  3.961007  0.589091  9.846131     a
1   1.789049  7.532745  2.767378  9.144689     b
2   1.218778  2.715299  3.626688  6.516540     a
3   9.327049  3.207037  4.513850  1.910565     b
4   1.822876  0.049689  0.794706  8.463579     a
5   1.451741  6.045066  6.575130  4.882635     b
6   6.741825  4.253489  2.162466  1.050275     a
7   5.186613  3.401384  1.055468  4.060071     a
8   0.921352  8.076272  6.727293  3.219364     a
9   3.209232  8.883085  9.696195  4.089006     b
10  0.970030  6.412611  5.377420  5.475744     b
11  7.905807  4.576925  6.991989  2.974597     b
12  4.907642  7.123328  9.851058  2.337944     b
13  1.191606  2.636071  5.740342  3.301008     b
14  1.454777  3.086801  3.573110  1.402692     b
15  3.253882  1.853393  5.156287  8.268881     b
16  4.779060  4.689739  1.228976  6.339238     b
17  7.950160  4.973974  4.304821  4.492152     b
18  0.581628  6.860053  2.974577  6.542594     a
19  6.872025  9.216597  0.936447  5.518941     b

In [111]:    
ex.loc[ex['group']=='a', cols] /= ex.iloc[0][cols]
ex

Out[111]:
        colA      colB       colC      colD group
0   1.000000  1.000000   1.000000  1.000000     a
1   1.789049  7.532745   2.767378  9.144689     b
2   0.206744  0.685507   6.156417  0.661838     a
3   9.327049  3.207037   4.513850  1.910565     b
4   0.309218  0.012545   1.349039  0.859584     a
5   1.451741  6.045066   6.575130  4.882635     b
6   1.143629  1.073840   3.670853  0.106669     a
7   0.879816  0.858717   1.791690  0.412352     a
8   0.156291  2.038944  11.419789  0.326967     a
9   3.209232  8.883085   9.696195  4.089006     b
10  0.970030  6.412611   5.377420  5.475744     b
11  7.905807  4.576925   6.991989  2.974597     b
12  4.907642  7.123328   9.851058  2.337944     b
13  1.191606  2.636071   5.740342  3.301008     b
14  1.454777  3.086801   3.573110  1.402692     b
15  3.253882  1.853393   5.156287  8.268881     b
16  4.779060  4.689739   1.228976  6.339238     b
17  7.950160  4.973974   4.304821  4.492152     b
18  0.098663  1.731896   5.049437  0.664484     a
19  6.872025  9.216597   0.936447  5.518941     b

计时

In [112]:
%%timeit
for idx in ex[ex["group"]=="a"].index:
    for col in cols:
        ex.ix[idx, col]=ex.ix[idx, col]/ex.ix[0,col]
100 loops, best of 3: 11 ms per loop

In [113]:
%timeit ex.loc[ex['group']=='a', cols] /= ex.iloc[0][cols]
100 loops, best of 3: 5.3 ms per loop

所以在你的小样本上,我的方法比以前快了2倍多,我希望它能更好地适应更大的数据集

相关问题 更多 >