Pandas:如何根据后续行添加具有值的列

2024-04-29 12:09:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下数据框

timestamp   close   open    high    low volume  datetime    candlestick_pattern candlestick_match_count SYMBOL
0   510037200   0.095486    0.088542    0.103299    0.088542    1857052800  1986-03-01 05:00:00 NO_PATTERN  0.0 MSFT
1   512715600   0.111979    0.095486    0.121962    0.092882    547142400   1986-04-01 05:00:00 NO_PATTERN  0.0 MSFT
2   515304000   0.121528    0.111979    0.123264    0.107639    333446400   1986-05-01 04:00:00 NO_PATTERN  0.0 MSFT
3   517982400   0.106771    0.121528    0.121528    0.102431    293788800   1986-06-01 04:00:00 NO_PATTERN  0.0 MSFT
4   520574400   0.098958    0.106771    0.109375    0.090278    460800000   1986-07-01 04:00:00 NO_PATTERN  0.0 MSFT
5   523252800   0.098958    0.098958    0.106771    0.094618    266256000   1986-08-01 04:00:00 NO_PATTERN  0.0 MSFT
6   525931200   0.098090    0.098958    0.109375    0.091146    285120000   1986-09-01 04:00:00 NO_PATTERN  0.0 MSFT
7   528523200   0.134549    0.098090    0.139757    0.095486    1565712000  1986-10-01 04:00:00 NO_PATTERN  0.0 MSFT
8   531205200   0.172743    0.135417    0.177951    0.135417    1053964800  1986-11-01 05:00:00 NO_PATTERN  0.0 MSFT
9   533797200   0.167535    0.172743    0.173611    0.157118    721526400   1986-12-01 05:00:00 NO_PATTERN  0.0 MSFT
10  536475600   0.253906    0.167535    0.269097    0.164931    1961596800  1987-01-01 05:00:00 CDLHIKKAKE_Bear 1.0 MSFT
11  539154000   0.266493    0.253906    0.276042    0.230903    1834272000  1987-02-01 05:00:00 NO_PATTERN  0.0 MSFT
12  541573200   0.335938    0.266493    0.343750    0.255208    1783324800  1987-03-01 05:00:00 NO_PATTERN  0.0 MSFT
13  544251600   0.361111    0.335938    0.378472    0.314236    2118585600  1987-04-01 05:00:00 NO_PATTERN  0.0 MSFT
14  546840000   0.400174    0.361111    0.445312    0.356771    2192054400  1987-05-01 04:00:00 CDLBELTHOLD_Bull    1.0 MSFT
15  549518400   0.354167    0.400174    0.400174    0.334201    1729353600  1987-06-01 04:00:00 CDLBELTHOLD_Bear    1.0 MSFT
16  552110400   0.326389    0.353299    0.367188    0.307292    1813536000  1987-07-01 04:00:00 NO_PATTERN  0.0 MSFT
17  554788800   0.412326    0.326389    0.421007    0.316840    2034921600  1987-08-01 04:00:00 CDLMORNINGSTAR_Bull 2.0 MSFT
18  557467200   0.460069    0.409722    0.462674    0.368056    1554902400  1987-09-01 04:00:00 CDLCLOSINGMARUBOZU_Bull 1.0 MSFT
19  560059200   0.345486    0.460069    0.550347    0.258681    3567219200  1987-10-01 04:00:00 NO_PATTERN  0.0 MSFT
20  562741200   0.310764    0.342014    0.361111    0.281250    1550172800  1987-11-01 05:00:00 NO_PATTERN  0.0 MSFT
21  565333200   0.376736    0.309028    0.388889    0.277778    1780182400  1987-12-01 05:00:00 CDLENGULFING_Bull   1.0 MSFT
22  568011600   0.387153    0.380208    0.428819    0.364583    1966780800  1988-01-01 05:00:00 CDL3OUTSIDE_Bull    7.0 MSFT

我需要添加两列: 在接下来的3行中增加了5%。 这应该是0或1,取决于接下来3行的CLOSE列是否增加了5%。代码应该只适用于包含“BULL”单词的模式的行

第二列应该是在接下来的3行中增加的实际百分比

显然,这不可能计算pandas数据帧的最后3行,因此代码需要处理这个问题

更新:

  1. 未来,数据帧将拥有所有库存,而不仅仅是MSFT
  2. 从当前行到下一行的5%

Tags: 数据no代码closeopentimestamplowpattern
1条回答
网友
1楼 · 发布于 2024-04-29 12:09:05

解决方案中最重要的是使用:

  • .shift(-3)展望未来3天
  • .groupby('SYMBOL')['close'].transform()结合使用,在数据帧中每一行的相同库存组中执行此操作

.shift()上的文档:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shift.html

.groupby().transform()上的文档:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.DataFrameGroupBy.transform.html

# taking subset just to keep better overview of what's happening
# the sorting is important for future calculations
df_subset = df[
    ['SYMBOL', 'datetime', 
     'close', 'candlestick_pattern']
].sort_values(by=['SYMBOL', 'datetime'])

# check what the future value will be within the SYMBOL group. Use shift(-3) to look 3 ahead.
df_subset['shift_3_rows_ahead'] = (df_subset
    .groupby('SYMBOL')['close']
    .transform(lambda x: x.shift(-3)))

# calculate percentage increase
df_subset['perc_increase'] = (
    (df_subset['shift_3_rows_ahead'] 
    / df_subset['close'] - 1) * 100)

# check if percentage increase is > 5 percent
df_subset['increase_larger_than_5_percent'] = df_subset['perc_increase'] > 5.

# check if relevant column contains the word bull
df_subset['contains_bull'] = df_subset['candlestick_pattern'].str.contains('bull', case=False)

# check if column contains bull and if the future increase is > 5%
df_subset['should_be_looked_at'] = (
    df_subset['contains_bull'] 
    & df_subset['increase_larger_than_5_percent']
)

相关问题 更多 >