多准则选择不同结构的旧数据帧填充新数据帧

2024-04-20 14:56:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我用熊猫做数据分析。我有这样一个输入文件:

   VEH  SEC POS   ACCELL    SPEED
    2   8.4 36.51 -0.2929   27.39
    3   8.4 23.57 -0.7381   33.09
    4   8.4 6.18   0.6164   38.8
    1   8.5 47.76  0        25.57

我需要重新组织数据,以便行是来自SEC的唯一(有序)值,作为第一列,然后其他列将是VEH1\u POS、VEH1\u SPEED、VEH1\u ACCELL、VEH2\u POS、VEH2\u SPEED、VEH2\u ACCELL等:

TIME VEH1_POS  VEH1_SPEED  VEH1_ACCEL  VEH2_POS, VEH2_SPEED, etc.
0.1   6.2        3.7         0.0        7.5       2.1    
0.2   6.8        3.2        -0.5        8.3       2.1
etc.

因此,例如,新数据帧中每行的VEH1_POS值将通过使用SEC值与新数据帧中该行的时间值匹配且VEH值==1的行从原始数据帧中的POS列中选择值来填充

要在新数据框中设置行,我将执行以下操作:

start = inputdf['SIMSEC'].min()
end = inputdf['SIMSEC'].max()
time_steps = frange(start, end, 0.1)
outputdf['TIME'] = time_steps

但我不知道如何从输入数据框中选择正确的值,并创建新数据框的其余部分以供进一步分析。还要注意的是,输入文件不会每秒钟都有每辆车的数据(时间戳)。所以解决方案也需要处理这个问题。我最好的猜测是:

outputdf['veh1_pos'] = np.where((inputdf['VEH NO'] == 1) & (inputdf['SIMSEC'] == row['Time Step']))

但这行不通


Tags: 文件数据postime时间etcsecstart
1条回答
网友
1楼 · 发布于 2024-04-20 14:56:48
import pandas as pd

# your data
# ==========================
print(df)

Out[272]: 
   VEH  SEC    POS  ACCELL  SPEED
0    2  8.4  36.51 -0.2929  27.39
1    3  8.4  23.57 -0.7381  33.09
2    4  8.4   6.18  0.6164  38.80
3    1  8.5  47.76  0.0000  25.57

# reshaping
# ==========================
result = df.set_index(['SEC','VEH']).unstack()

Out[278]: 
       POS                     ACCELL                          SPEED                    
VEH      1      2      3     4      1       2       3       4      1      2      3     4
SEC                                                                                     
8.4    NaN  36.51  23.57  6.18    NaN -0.2929 -0.7381  0.6164    NaN  27.39  33.09  38.8
8.5  47.76    NaN    NaN   NaN      0     NaN     NaN     NaN  25.57    NaN    NaN   NaN

所以在这里,列有多级索引,其中第一级是POS, ACCELL, SPEED,第二级是VEH=1,2,3,4

# if you want to rename the column
temp_z = result.columns.get_level_values(0)
temp_y = result.columns.get_level_values(1)
temp_x = ['VEH'] * len(temp_y)

result.columns = ['{}{}_{}'.format(x,y,z) for x,y,z in zip(temp_x, temp_y, temp_z)]

Out[298]: 
     VEH1_POS  VEH2_POS  VEH3_POS  VEH4_POS  VEH1_ACCELL  VEH2_ACCELL  VEH3_ACCELL  VEH4_ACCELL  VEH1_SPEED  VEH2_SPEED  VEH3_SPEED  VEH4_SPEED
SEC                                                                                                                                            
8.4       NaN     36.51     23.57      6.18          NaN      -0.2929      -0.7381       0.6164         NaN       27.39       33.09        38.8
8.5     47.76       NaN       NaN       NaN            0          NaN          NaN          NaN       25.57         NaN         NaN         NaN

相关问题 更多 >