使用.iterrows（）以更干净的方式迭代pandas dateframe中的行，并跟踪特定值之间的行

Time Var1 EvntType Var2 0 15 1 2 17 1 19 1 1 45 2 21 6 2 43 3 23 3 2 65 4 25 0 2 76 #this one should be skipped 5 26 2 2 35 6 28 3 2 25 7 31 5 1 16 8 33 1 2 25 9 36 5 1 36 10 39 1 2 21

i=0 eventCounter = 0 lastStartTime = 0 length = data[data['EvntType']==1].shape[0] results = np.zeros((length,3),dtype=int) for row in data[data['Var1'] > 0].iterrows(): myRow = row[1] if myRow['EvntType'] == 1: results[i,0] = lastStartTime results[i,1] = myRow['Time'] - lastStartTime results[i,2] = eventCounter lastStartTime = myRow['Time'] eventCounter = 0 i += 1 else: eventCounter += 1

1条回答

网友

1楼 · 发布于 2024-04-25 06:12:30

您可以使用以下方法删除Var1等于0的行：

df = df.loc[df['Var1'] != 0]

然后创建一个布尔掩码，其中EvntType为1:

^{pr2}$

查找与mask为真的行关联的Time：

times = df.loc[mask, 'Time']
# 1    19
# 7    31
# 9    36
# Name: Time, dtype: int64

还可以找到mask为真的序数索引：

idx = np.flatnonzero(mask)
# array([1, 6, 8])

start_time是times[:-1]中的所有值。在

In [56]: times[:-1]
Out[56]: 
1    19
7    31
Name: Time, dtype: int64

time_inbetween是时间上的差异，np.diff(times)

In [55]: np.diff(times)
Out[55]: array([12,  5])

event_count是idx中的差减1。在

In [57]: np.diff(idx)-1
Out[57]: array([4, 1])

import numpy as np
import pandas as pd

df = pd.DataFrame({'EvntType': [2, 1, 2, 2, 2, 2, 2, 1, 2, 1, 2],
                   'Time': [15, 19, 21, 23, 25, 26, 28, 31, 33, 36, 39],
                   'Var1': [1, 1, 6, 3, 0, 2, 3, 5, 1, 5, 1],
                   'Var2': [17, 45, 43, 65, 76, 35, 25, 16, 25, 36, 21]})

# Remove rows where Var1 equals 0
df = df.loc[df['Var1'] != 0]

mask = df['EvntType']==1
times = df.loc[mask, 'Time']
idx = np.flatnonzero(mask)

result = pd.DataFrame(
    {'start_time': times[:-1],
     'time_inbetween': np.diff(times),
     'event_count': np.diff(idx)-1})

print(result)

收益率

   event_count  start_time  time_inbetween
1            4          19              12
7            1          31               5

相关问题更多 >

编程相关推荐

热门问题

热门文章