按组条件更改下一个观察值的值

2024-04-27 10:26:50 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个个人出行数据集(trips_data)。每次观察都是一次行程,行程开始时间(strttime)、行程结束时间(endtime)和执行行程的人员。对于某些人来说,旅行的结束时间晚于下一次旅行的开始时间。以下是一个时间格式为hhmm的示例:

       TRIPID clepersonne  strttime  endtime
90  100010413    10001041      1600     1614
91  100010414    10001041      1615     1648
92  100010415    10001041      1645     1726
93  100010416    10001041      1930     1954
94  100010621    10001062       900      921
95  100010622    10001062      1000     1013

对于同一个人{},下一次行程{}{}的{}{}终止时间晚于{}。我想通过在下一次行程的开始时间之前重放行程的endtime来纠正这种不一致性。对于本例,我想要的结果是:

       TRIPID clepersonne  strttime  endtime
90  100010413    10001041      1600     1614
91  100010414    10001041      1615     *1645*
92  100010415    10001041      1645     1726
93  100010416    10001041      1930     1954
94  100010621    10001062       900      921
95  100010622    10001062      1000     1013

我尝试过这样做:

    trips_data = trips_data.sort_index() # To iterate each value
    for i in range(0, len(trips_data.index)) :
        trips_data['endtime'] = np.where((trips_data.strttime[i+1]<trips_data.endtime[i]) & (trips_data.clepersonne[i+1] == trips_data.clepersonne[i]), trips_data.strttime[i+1], trips_data['endtime'] ) 

但我得到了这个错误:

Traceback (most recent call last):
  File "C:\Users\Utilisateur\AppData\Roaming\Python\Python37\site-packages\IPython\core\interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-23-64092a4318fd>", line 3, in <module>
    trips_data['endtime'] = np.where((trips_data.strttime[i+1]<trips_data.endtime[i]) & (trips_data.clepersonne[i+1] == trips_data.clepersonne[i]), trips_data.strttime[i+1], trips_data['endtime'] )
  File "C:\Users\Utilisateur\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\series.py", line 1071, in __getitem__
    result = self.index.get_value(self, key)
  File "C:\Users\Utilisateur\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 4730, in get_value
    return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
  File "pandas/_libs/index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
  File "pandas/_libs/index.pyx", line 88, in pandas._libs.index.IndexEngine.get_value
  File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 992, in pandas._libs.hashtable.Int64HashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 998, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 122

你能帮我吗? 谢谢


Tags: inpandasdatagetindexvalueline时间
1条回答
网友
1楼 · 发布于 2024-04-27 10:26:50

使用:

next_start=df.groupby('clepersonne')['strttime'].shift(-1)
mask=df['endtime'].sub(next_start)>0
df['endtime']=df['endtime'].mask(mask,next_start)
print(df)

       TRIPID  clepersonne  strttime  endtime
90  100010413     10001041      1600     1614
91  100010414     10001041      1615     1645
92  100010415     10001041      1645     1726
93  100010416     10001041      1930     1954
94  100010621     10001062       900      921
95  100010622     10001062      1000     1013

相关问题 更多 >