使用merg保留后面的n行和前面的k行

2024-04-23 20:37:06 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在合并两个df,并希望在匹配后访问列的前n行。你知道吗

其中events_df['event']prices_df['date']之间存在匹配

同时也是

events_df['ticker']prices_df['tic']

我想保留prices_df['price']中匹配行之后和之后的前n个值

events_df

  event ticker
0 01-01-2019  MSFT 
1 12-12-2018  MSFT 
2 12-11-2018  MSFT   
3 02-03-2019  AAPL 
4 12-12-2018  AAPL 
5 12-11-2018  AAPL 
6 01-01-2019  AAPL 


prices_df

  date tic price 
0 01-01-2019 MSFT 1.0
1 02-01-2019 MSFT 1.1
2 03-01-2019 MSFT 1.2
3 04-01-2019 MSFT 1.3
4 05-01-2019 MSFT 1.4 
5 01-01-2019 AAPL 2.0
6 02-01-2019 AAPL 2.1
7 03-01-2019 AAPL 2.2
8 04-01-2019 AAPL 2.3
9 05-01-2019 AAPL 2.4

我已经试过合并了

merged = events_df.merge(prices_df,left_on=['ticker','event'],right_on=['tic','date'])

n=4的预期输出(来自匹配events_df['events']索引0,6)

  date ticker price
0 01-01-2019 MSFT 1.0
1 02-01-2019 MSFT 1.1
2 03-01-2019 MSFT 1.2
3 04-01-2019 MSFT 1.3
4 01-01-2019 AAPL 2.0
5 02-01-2019 AAPL 2.1
6 03-01-2019 AAPL 2.2
7 04-01-2019 AAPL 2.3

Tags: eventdfdateonmergemergedticevents
2条回答

你看起来很好。您只需要从中提取所需的列,因为在合并之后,它包含来自两个数据帧的所有列。所以:

merged = events_df.merge(prices_df, left_on=['ticker', 'event'], right_on=['tic', 'date'])
merged = merged['date', 'picker', 'price']

然后必须对其进行过滤,使价格小于3(或者n,如果需要):

n = 3
merged = merged[merged['price'] < n]

用途:

#changed sample data for more general
print (prices_df)
          date   tic  price
0   01-01-2018  MSFT    1.0
1   01-01-2019  MSFT    1.0
2   02-01-2019  MSFT    1.1
3   03-01-2019  MSFT    1.2
4   04-01-2019  MSFT    1.3
5   05-01-2019  MSFT    1.4
6   01-01-2019  AAPL    2.0
7   02-01-2019  AAPL    2.1
8   03-01-2019  AAPL    2.2
9   04-01-2019  AAPL    2.3
10  05-01-2019  AAPL    2.4

#n to down, k to up
n = 2 
k = 1
#get index by reset_index for avoid lost it
idx = events_df.merge(prices_df.rename_axis('idx').reset_index(),
                         left_on=['ticker','event'],
                         right_on=['tic','date'])['idx']

print (idx)
0    1
1    6
Name: idx, dtype: int64

#create groups by matching with original index, [::-1] for change ordering
s1 = prices_df.index.isin(idx).cumsum()
s2 = prices_df.index.isin(idx)[::-1].cumsum()

#repalce first and last groups to NaNs 
up = np.where(s1 != 0, s1, np.nan)
lo = np.where(s2[::-1] != 0, s2[::-1] , np.nan)

#get couters compare by le (<=) and remove NaNs groups (first, last)
prices_df['um'] = prices_df.groupby(up).cumcount().le(n) & ~np.isnan(up)
prices_df['lm'] = prices_df.groupby(lo).cumcount(ascending=False).le(k) & ~np.isnan(lo)
print (prices_df)
          date   tic  price     um     lm
0   01-01-2018  MSFT    1.0  False   True
1   01-01-2019  MSFT    1.0   True   True
2   02-01-2019  MSFT    1.1   True  False
3   03-01-2019  MSFT    1.2   True  False
4   04-01-2019  MSFT    1.3  False  False
5   05-01-2019  MSFT    1.4  False   True
6   01-01-2019  AAPL    2.0   True   True
7   02-01-2019  AAPL    2.1   True  False
8   03-01-2019  AAPL    2.2   True  False
9   04-01-2019  AAPL    2.3  False  False
10  05-01-2019  AAPL    2.4  False  False

#filter by boolean indexing
mask = prices_df['um'] | prices_df['lm'] 
prices_df = prices_df[mask]
print (prices_df)
         date   tic  price     um     lm
0  01-01-2018  MSFT    1.0  False   True
1  01-01-2019  MSFT    1.0   True   True
2  02-01-2019  MSFT    1.1   True  False
3  03-01-2019  MSFT    1.2   True  False
5  05-01-2019  MSFT    1.4  False   True
6  01-01-2019  AAPL    2.0   True   True
7  02-01-2019  AAPL    2.1   True  False
8  03-01-2019  AAPL    2.2   True  False

相关问题 更多 >