使用merg保留后面的n行和前面的k行

events_df event ticker 0 01-01-2019 MSFT 1 12-12-2018 MSFT 2 12-11-2018 MSFT 3 02-03-2019 AAPL 4 12-12-2018 AAPL 5 12-11-2018 AAPL 6 01-01-2019 AAPL prices_df date tic price 0 01-01-2019 MSFT 1.0 1 02-01-2019 MSFT 1.1 2 03-01-2019 MSFT 1.2 3 04-01-2019 MSFT 1.3 4 05-01-2019 MSFT 1.4 5 01-01-2019 AAPL 2.0 6 02-01-2019 AAPL 2.1 7 03-01-2019 AAPL 2.2 8 04-01-2019 AAPL 2.3 9 05-01-2019 AAPL 2.4

2条回答

网友

1楼 · 编辑于 2024-04-23 20:37:06

你看起来很好。您只需要从中提取所需的列，因为在合并之后，它包含来自两个数据帧的所有列。所以：

merged = events_df.merge(prices_df, left_on=['ticker', 'event'], right_on=['tic', 'date'])
merged = merged['date', 'picker', 'price']

然后必须对其进行过滤，使价格小于3（或者n，如果需要）：

n = 3
merged = merged[merged['price'] < n]

网友

2楼 · 编辑于 2024-04-23 20:37:06

用途：

#changed sample data for more general
print (prices_df)
          date   tic  price
0   01-01-2018  MSFT    1.0
1   01-01-2019  MSFT    1.0
2   02-01-2019  MSFT    1.1
3   03-01-2019  MSFT    1.2
4   04-01-2019  MSFT    1.3
5   05-01-2019  MSFT    1.4
6   01-01-2019  AAPL    2.0
7   02-01-2019  AAPL    2.1
8   03-01-2019  AAPL    2.2
9   04-01-2019  AAPL    2.3
10  05-01-2019  AAPL    2.4

#n to down, k to up
n = 2 
k = 1
#get index by reset_index for avoid lost it
idx = events_df.merge(prices_df.rename_axis('idx').reset_index(),
                         left_on=['ticker','event'],
                         right_on=['tic','date'])['idx']

print (idx)
0    1
1    6
Name: idx, dtype: int64

#create groups by matching with original index, [::-1] for change ordering
s1 = prices_df.index.isin(idx).cumsum()
s2 = prices_df.index.isin(idx)[::-1].cumsum()

#repalce first and last groups to NaNs 
up = np.where(s1 != 0, s1, np.nan)
lo = np.where(s2[::-1] != 0, s2[::-1] , np.nan)

#get couters compare by le (<=) and remove NaNs groups (first, last)
prices_df['um'] = prices_df.groupby(up).cumcount().le(n) & ~np.isnan(up)
prices_df['lm'] = prices_df.groupby(lo).cumcount(ascending=False).le(k) & ~np.isnan(lo)
print (prices_df)
          date   tic  price     um     lm
0   01-01-2018  MSFT    1.0  False   True
1   01-01-2019  MSFT    1.0   True   True
2   02-01-2019  MSFT    1.1   True  False
3   03-01-2019  MSFT    1.2   True  False
4   04-01-2019  MSFT    1.3  False  False
5   05-01-2019  MSFT    1.4  False   True
6   01-01-2019  AAPL    2.0   True   True
7   02-01-2019  AAPL    2.1   True  False
8   03-01-2019  AAPL    2.2   True  False
9   04-01-2019  AAPL    2.3  False  False
10  05-01-2019  AAPL    2.4  False  False

#filter by boolean indexing
mask = prices_df['um'] | prices_df['lm'] 
prices_df = prices_df[mask]
print (prices_df)
         date   tic  price     um     lm
0  01-01-2018  MSFT    1.0  False   True
1  01-01-2019  MSFT    1.0   True   True
2  02-01-2019  MSFT    1.1   True  False
3  03-01-2019  MSFT    1.2   True  False
5  05-01-2019  MSFT    1.4  False   True
6  01-01-2019  AAPL    2.0   True   True
7  02-01-2019  AAPL    2.1   True  False
8  03-01-2019  AAPL    2.2   True  False

相关问题更多 >

编程相关推荐

热门问题

热门文章