按行条件操作

2024-06-16 14:06:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我被从回声测深仪收集的测深数据困住了。 看起来是这样的:

ID No Time Lat Lon Alt East North Count Fix
LL 0 589105179.00 24.156741 -110.321346 -31.50 4898039.453 -3406895.053 9 2 
ED 0 1.12 0.00
ED 0 1.53 0.00
ED 0 1.60 0.00
ED 0 1.08 0.00
ED 0 1.51 0.00
ED 0 1.06 0.00
LL 0 589105180.00 24.156741 -110.321346 -31.50 4898039.836 -3406894.045 9 2
ED 0 1.06 0.00
ED 0 1.12 0.00
ED 0 0.98 0.00
ED 0 0.96 0.00
ED 0 0.91 0.00
ED 0 0.90 0.00
LL 0 589105181.00 24.156741 -110.321346 -31.50 4898039.433 -3406894.003 9 2
ED 0 1.04 0.00
ED 0 1.04 0.00
ED 0 0.93 0.00
ED 0 0.99 0.00
ED 0 0.99 0.00
ED 0 1.01 0.00
LL 0 589105182.00 24.156741 -110.321346 -31.51 4898038.460 -3406894.841 9 2
ED 0 0.99 0.00
ED 0 0.96 0.00
ED 0 0.96 0.00
ED 0 0.96 0.00
ED 0 0.98 0.00
ED 0 0.98 0.00
LL 0 589105183.00 24.156741 -110.321346 -31.51 4898039.804 -3406894.107 9 2
ED 0 1.01 0.00
ED 0 1.01 0.00
ED 0 0.91 0.00
ED 0 1.04 0.00
ED 0 1.04 0.00
ED 0 0.96 0.00

每个LL行给出下一个ED行水深测量的时间(2000年以来的秒数)、坐标、方向等。你知道吗

我们需要计算每个ED度量的平均值并将其分配到LL行。问题是,在完整的文件中,ED度量值并不总是6,有时是5或4。你知道吗

到目前为止,我已经做到了:

data = pd.read_csv('Echosounder.txt', sep = '\t')    
LLs = data[data['ID'] == 'LL']    
EDs = data[data['ID'] == 'ED']

我喜欢的是它尊重索引顺序。 我注意到有不同数量的ED度量,因为这样做之后:

EDs.groupby(np.arange(len(EDs))//6).mean()

最后的LL行没有水深测量值。你知道吗

请帮忙。你知道吗


Tags: 数据noiddatatime度量altlon
3条回答

戴兹有一个很好的答案。或者,如果您不想假定时间是唯一的,您可以以类似的方式使用索引。你知道吗

data['dummy'] = np.nan
data.loc[data['ID']=='LL', 'dummy'] = data.loc[data['ID']=='LL', 'dummy'].index
data['dummy'].ffill(axis=0, inplace=True)


LLs.set_index('dummy')\
     .join(data[data['ID']=='ED']\
     .groupby('dummy').mean()[['No','Time','Lat']], rsuffix='_mean')

另一种方法是迭代每个项并将时间戳添加到LD行。你知道吗

import pandas as pd
df = pd.read_csv('data.csv', sep='\t', index_col=False)

df.head()


ID  No  Time    Lat Lon Alt East    North   Count   Fix timestamp   ed_value
0   LL  0   5.891052e+08    24.156741   -110.321346 -31.5   4898039.453 -3406895.053    9.0 2.0 589105179.0 NaN
1   ED  0   1.120000e+00    0.000000    NaN NaN NaN NaN NaN NaN 589105179.0 1.12
2   ED  0   1.530000e+00    0.000000    NaN NaN NaN NaN NaN NaN 589105179.0 1.53
3   ED  0   1.600000e+00    0.000000    NaN NaN NaN NaN NaN NaN 589105179.0 1.60
4   ED  0   1.080000e+00    0.000000    NaN NaN NaN NaN NaN NaN 589105179.0 1.08

LLs = df[df['ID'] == 'LL']    
EDs = df[df['ID'] == 'ED']

for x in df.iterrows():
    if x[1]['ID'] == 'LL':
        timestamp = x[1]['Time']
    elif x[1]['ID'] == 'ED':
        df.loc[x[0], 'ed_value'] = x[1]['Time']
    df.loc[x[0], 'timestamp'] = timestamp

df.groupby('timestamp').mean()

    No  Time    Lat Lon Alt East    North   Count   Fix ed_value
timestamp                                       
589105179.0 0   8.415788e+07    3.450963    -110.321346 -31.50  4898039.453 -3406895.053    9.0 2.0 1.316667
589105180.0 0   8.415788e+07    3.450963    -110.321346 -31.50  4898039.836 -3406894.045    9.0 2.0 0.988333
589105181.0 0   8.415788e+07    3.450963    -110.321346 -31.50  4898039.433 -3406894.003    9.0 2.0 1.000000
589105182.0 0   8.415788e+07    3.450963    -110.321346 -31.51  4898038.460 -3406894.841    9.0 2.0 0.971667
589105183.0 0   8.415788e+07    3.450963    -110.321346 -31.51  4898039.804 -3406894.107    9.0 2.0 0.995000

看起来每行的时间都是唯一的。可以将其用作分组键。首先,为所有LL行创建一个与时间相等的新分组列:

data.loc[data['ID']=='LL', 'key'] = data['Time']

将每个键的最新值传播到ED行:

data['key'].ffill(inplace=True)

按新键分组并将结果与LLs数据帧连接:

LLs.set_index('Time')\
   .join(data[data['ID']=='ED']\
         .groupby('key').mean()[['No','Time','Lat']], rsuffix='_mean')
#             ID  No        Lat    ...     No_mean      Time  Lat_mean
#Time                              ...                                
#589105179.0  LL   0  24.156741    ...           0  1.316667       0.0
#589105180.0  LL   0  24.156741    ...           0  0.988333       0.0
#589105181.0  LL   0  24.156741    ...           0  1.000000       0.0
#589105182.0  LL   0  24.156741    ...           0  0.971667       0.0
#589105183.0  LL   0  24.156741    ...           0  0.995000       0.0

最后三列是方法。你知道吗

相关问题 更多 >