选择数据帧中不在序列中的行

2024-06-16 12:02:30 发布

您现在位置:Python中文网/ 问答频道 /正文

因此,我有一个名为trips的数据帧,包含以下信息:

route_id     service_id  shape_id                      trip_id
0     BX12  GH_B6-Weekday  BX120805  GH_B6-Weekday-004000_BX12_1
1     BX12  GH_B6-Weekday  BX120809  GH_B6-Weekday-009000_BX12_1
2     BX12  GH_B6-Weekday  BX120792  GH_B6-Weekday-013000_BX12_1
3     BX12  GH_B6-Weekday  BX120809  GH_B6-Weekday-017000_BX12_1
4     BX12  GH_B6-Weekday  BX120792  GH_B6-Weekday-021000_BX12_1
...

我还有一个名为invalidTrips的系列,包含以下信息:

trip_id
11760139-BPPB6-BP_B6-Weekday-10         16
11760139-BPPB6-BP_B6-Weekday-10-SDon    16
11760140-BPPB6-BP_B6-Weekday-10         19
11760140-BPPB6-BP_B6-Weekday-10-SDon    19
11760141-BPPB6-BP_B6-Weekday-10         16
...

我该如何选择trips中没有trip_idinvalid_trips中的trip_id匹配的所有行?你知道吗

编辑:现在我有了这个代码:

# Grab the number of trips made outside min and max hour.
tooEarly = stopTimes['arrival_time'] < base_mintime
tooLate = stopTimes['departure_time'] > base_maxtime
invalidTrips = stopTimes[(tooEarly | tooLate)].groupby('trip_id').size()

# Filter out the invalid trips.
print(invalidTrips.size)
print(trips.size)
in_validTrips = ~trips.trip_id.isin(invalidTrips)
validTrips = trips[in_validTrips][['route_id', 'service_id', 'shape_id']]
print(validTrips.size)

不管出于什么原因,尽管invalidTrips.size可以根据base_mintimebase_maxtime而变化,但validTrips.size保持不变,即使我认为它与invalidTrips.size相反。为什么会这样?你知道吗

(关于进一步的背景信息,这些都是从GTFS数据中提取的。)


Tags: 信息idbasesizeghbptripb6
1条回答
网友
1楼 · 发布于 2024-06-16 12:02:30

更新:

尝试isin()函数和~运算符

根据@EdChum在注释中的更正-如果invalid_trips是系列类型:

trips[~trips.trip_id.isin(invalidTrips.index)]

测试:

In [39]: invalidTrips
Out[39]:
trip_id
11760139-BPPB6-BP_B6-Weekday-10         16
11760139-BPPB6-BP_B6-Weekday-10-SDon    16
11760140-BPPB6-BP_B6-Weekday-10         19
11760140-BPPB6-BP_B6-Weekday-10-SDon    19
11760141-BPPB6-BP_B6-Weekday-10         16
GH_B6-Weekday-017000_BX12_1             11         # <  i've added it intentionally
Name: val, dtype: int64

In [40]: trips
Out[40]:
  route_id     service_id  shape_id                      trip_id
0     BX12  GH_B6-Weekday  BX120805  GH_B6-Weekday-004000_BX12_1
1     BX12  GH_B6-Weekday  BX120809  GH_B6-Weekday-009000_BX12_1
2     BX12  GH_B6-Weekday  BX120792  GH_B6-Weekday-013000_BX12_1
3     BX12  GH_B6-Weekday  BX120809  GH_B6-Weekday-017000_BX12_1  # <  exclude this row 
4     BX12  GH_B6-Weekday  BX120792  GH_B6-Weekday-021000_BX12_1

In [41]: trips[~trips.trip_id.isin(invalidTrips.index)]
Out[41]:
  route_id     service_id  shape_id                      trip_id
0     BX12  GH_B6-Weekday  BX120805  GH_B6-Weekday-004000_BX12_1
1     BX12  GH_B6-Weekday  BX120809  GH_B6-Weekday-009000_BX12_1
2     BX12  GH_B6-Weekday  BX120792  GH_B6-Weekday-013000_BX12_1
4     BX12  GH_B6-Weekday  BX120792  GH_B6-Weekday-021000_BX12_1

相关问题 更多 >