python pandas通过另一系列、多个列来过滤数据帧

most_liquid_contracts.head(20) Out[32]: 2007-04-26 706 2007-04-27 706 2007-04-29 706 2007-04-30 706 2007-05-01 706 2007-05-02 706 2007-05-03 706 2007-05-04 706 2007-05-06 706 2007-05-07 706 2007-05-08 706 2007-05-09 706 2007-05-10 706 2007-05-11 706 2007-05-13 706 2007-05-14 706 2007-05-15 706 2007-05-16 706 2007-05-17 706 2007-05-18 706 dtype: int64 df.head(20).to_string Out[40]: <bound method DataFrame.to_string of delivery volume 2007-04-27 11:55:00+01:00 705 1 2007-04-27 13:46:00+01:00 705 1 2007-04-27 14:15:00+01:00 705 1 2007-04-27 14:33:00+01:00 705 1 2007-04-27 14:35:00+01:00 705 1 2007-04-27 17:05:00+01:00 705 16 2007-04-27 17:07:00+01:00 705 1 2007-04-27 17:12:00+01:00 705 1 2007-04-27 17:46:00+01:00 705 1 2007-04-27 18:25:00+01:00 705 2 2007-04-26 23:00:00+01:00 706 10 2007-04-26 23:01:00+01:00 706 12 2007-04-26 23:02:00+01:00 706 1 2007-04-26 23:05:00+01:00 706 21 2007-04-26 23:06:00+01:00 706 10 2007-04-26 23:07:00+01:00 706 19 2007-04-26 23:08:00+01:00 706 1 2007-04-26 23:13:00+01:00 706 10 2007-04-26 23:14:00+01:00 706 62 2007-04-26 23:15:00+01:00 706 3>

# ATTEMPT 1 most_liquid_contracts.index = pd.to_datetime(most_liquid_contracts.index, unit='d') df['days'] = pd.to_datetime(df.index.date, unit='d') mlc = most_liquid_contracts.to_frame(name='delivery') mlc['days'] = mlc.index.date data = pd.merge(mlc, df, on=['delivery', 'days'], left_index=True) # ATTEMPT 2 liquid = pd.merge(mlc, df, on='delivery', how='inner', left_index=True) # this gets me closer (ie. retains granularity), but somehow seems to be an outer join? it includes the union but not the intersection. this should be a subset of df, but instead has about x50 the rows, at around 195B. df originally has 4B

2条回答

网友

1楼 · 编辑于 2024-05-23 18:37:23

棘手的部分是合并具有不同日期时间分辨率索引的两个序列/数据帧。一旦你智能地组合它们，你就可以正常地过滤了。在

# Make sure your series has a name
# Make sure the index is pure dates, not date 00:00:00
most_liquid_contracts.name = 'most'
most_liquid_conttracts.index = most_liquid_contracts.index.date

data = df
data['day'] = data.index.date
combined = data.join(most_liquid_contracts, on='day', how='left')

现在你可以做一些类似

^{pr2}$

这将产生data（df）中的行，其中data.delivery等于当天most_liquid_contracts中的值。在

网友

2楼 · 编辑于 2024-05-23 18:37:23

我假设我对你的理解是正确的，最流动的合同系列是包含N个整数N的最大交货量的系列。你想过滤df，只包括交货数足够高的天数，以便列在清单上。因此，你不能简单地把所有的东西都去掉。在

threshold = min(most_liquid_contracts)
filtered = df[df['delivery'] >= threshold]

相关问题更多 >

编程相关推荐

热门问题

热门文章