熊猫新一列(根据其他列中的值)

2024-05-14 20:51:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个df,由不同主题的XY坐标填充。我想创建一个新列,从这些主题中获取指定的XY坐标

当在'Person'列中突出显示任何主题的名称时,就可以实现这一点。这将返回该主题在该索引处的XY坐标

import pandas as pd
import numpy as np
import random

AA = 10, 20

k = 5
N = 10

df = pd.DataFrame({
    'John Doe_X' : np.random.uniform(k, k + 100 , size=N),
    'John Doe_Y' : np.random.uniform(k, k + 100 , size=N),
    'Kevin Lee_X' : np.random.uniform(k, k + 100 , size=N),
    'Kevin Lee_Y' : np.random.uniform(k, k + 100 , size=N),   
    'Liam Smith_X' : np.random.uniform(k, k + -100 , size=N),
    'Liam Smith_Y' : np.random.uniform(k, k + 100 , size=N),
    'Event' : ['AA', 'nan', 'BB', 'nan', 'nan', 'CC', 'nan','CC', 'DD','nan'],                                 
    'Person' : ['nan','nan','John Doe','John Doe','nan','Kevin Lee','nan','Liam Smith','John Doe','John Doe']})


df['X'] = df.apply(lambda row: row.get(row['Person']+'_X') if pd.notnull(row['Person']) else np.nan, axis=1)
df['Y'] = df.apply(lambda row: row.get(row['Person']+'_Y') if pd.notnull(row['Person']) else np.nan, axis=1)

输出:

  Event  John Doe_X  John Doe_Y  Kevin Lee_X  Kevin Lee_Y  Liam Smith_X  \
0    AA   75.047164   19.281168    28.064313    87.184248    -76.148559   
1   nan   50.642782   68.308319    46.088057    64.132263    -83.109383   
2    BB    9.965115   77.950894    48.864693     8.613132      0.106708   
3   nan   44.726136   58.751520    69.904076    40.818433    -87.656064   
4   nan  101.501119   99.156872   101.976300    93.539749    -57.026015   
5    CC   87.778446   65.814911     7.302116    40.577156    -28.703879   
6   nan   99.682139   91.715231    88.029451    82.309191    -66.444582   
7    CC   38.248267   38.648960    76.065297    67.322639    -34.754868   
8    DD   69.429353   61.252800    83.024358    58.038962    -62.001353   
9   nan    9.522023   73.009883    41.873986     8.677565    -20.389939   

   Liam Smith_Y      Person          X          Y  
0     18.420494         nan        NaN        NaN  
1     33.206289         nan        NaN        NaN  
2     73.833204    John Doe   9.965115  77.950894  
3     39.652071    John Doe  44.726136  58.751520  
4     88.176561         nan        NaN        NaN  
5     53.776995   Kevin Lee   7.302116  40.577156  
6     95.025923         nan        NaN        NaN  
7     26.851864  Liam Smith -34.754868  26.851864  
8    102.771046    John Doe  69.429353  61.252800  
9     28.633231    John Doe   9.522023  73.009883

我现在希望使用'Event'列来优化新的['X','Y']列。具体来说,当值'AA''Event'列中时,我想返回AA (10,20)的坐标。此外,我喜欢得到相同的坐标,直到下一个坐标出现

所以输出看起来像:

  Event  John Doe_X  John Doe_Y  Kevin Lee_X  Kevin Lee_Y  Liam Smith_X  \
0    AA   75.047164   19.281168    28.064313    87.184248    -76.148559   
1   nan   50.642782   68.308319    46.088057    64.132263    -83.109383   
2    BB    9.965115   77.950894    48.864693     8.613132      0.106708   
3   nan   44.726136   58.751520    69.904076    40.818433    -87.656064   
4   nan  101.501119   99.156872   101.976300    93.539749    -57.026015   
5    CC   87.778446   65.814911     7.302116    40.577156    -28.703879   
6   nan   99.682139   91.715231    88.029451    82.309191    -66.444582   
7    CC   38.248267   38.648960    76.065297    67.322639    -34.754868   
8    DD   69.429353   61.252800    83.024358    58.038962    -62.001353   
9   nan    9.522023   73.009883    41.873986     8.677565    -20.389939   

   Liam Smith_Y      Person          X          Y  
0     18.420494         nan         10         20  
1     33.206289         nan         10         20  
2     73.833204    John Doe   9.965115  77.950894  
3     39.652071    John Doe  44.726136  58.751520  
4     88.176561         nan        NaN        NaN  
5     53.776995   Kevin Lee   7.302116  40.577156  
6     95.025923         nan        NaN        NaN  
7     26.851864  Liam Smith -34.754868  26.851864  
8    102.771046    John Doe  69.429353  61.252800  
9     28.633231    John Doe   9.522023  73.009883 

我试过写这样的东西:

for value in df['Event']:
    if value == 'AA' :
        df['X', 'Y'] = AA

但是得到一个ValueError:ValueError: Length of values does not match length of index


Tags: dfnprandomuniformnanjohnpersonrow
2条回答

您的代码有一些错误(其中一个错误是Person和Player弄错了)。我想这是粘贴错误

但是,使用掩码并将元组AA应用于掩码使用的子集df.loc可以很容易地解决您的问题

m = df['Event'] == 'AA'
df.loc[m, ['X','Y']] = AA

如果要遍历行,可以尝试:

# iterate through rows
for index, row in df.iterrows():
    # check Event value for the row
    if row['Event'] == 'AA' :
        # update dataframe
        df.loc[index,('X', 'Y')] = AA

print(df)

结果:

  Event  John Doe_X  John Doe_Y  Kevin Lee_X  Kevin Lee_Y  Liam Smith_X  \
0    AA   12.603084   81.636376    25.997186    76.733337    -17.683132   
1   nan  104.652839  104.064767    56.762357    83.599629    -34.714117   
2    BB   69.724434   33.324135    98.452840    57.407782     -8.479175   
3   nan   16.361719   51.290716    41.929234    46.494053    -81.882100   
4   nan   30.874579   34.683986    95.434111    80.343098    -62.448286   
5    CC   77.619875   70.164773     7.385376    40.142712    -55.590472   
6   nan   31.214066   54.081010    36.249414    34.218611    -21.754019   
7    CC   91.487647   28.307019    71.235864    48.915612    -37.196812   
8    DD   45.036216   61.655465    50.231592    29.511502     -4.583804   
9   nan   95.249002   25.649100    31.959114    10.234085    -93.106746   
X   NaN         NaN         NaN          NaN          NaN           NaN   

   Liam Smith_Y      Person          X           Y  
0     86.267909         nan  10.000000   20.000000  
1     43.090388         nan        NaN         NaN  
2     56.330139    John Doe  69.724434   33.324135  
3     65.648633    John Doe  16.361719   51.290716  
4     16.349304         nan        NaN         NaN  
5      5.528887   Kevin Lee   7.385376   40.142712  
6     75.717007         nan        NaN         NaN  
7    100.925457  Liam Smith -37.196812  100.925457  
8     87.256541    John Doe  45.036216   61.655465  
9     35.361163    John Doe  95.249002   25.649100  
X           NaN         NaN        NaN         NaN  

相关问题 更多 >

    热门问题