在当前记录之前计算15个记录的平均值作为新列

2024-05-14 14:04:50 发布

您现在位置:Python中文网/ 问答频道 /正文

我有1分钟的股票数据如下:

                      bidopen    bidhigh    bidlow  bidclose bidvolume
currencypair
2007-03-30 16:01:00    1.9687    1.96900    1.9686    1.9686    877.40
2007-03-30 16:02:00    1.9686    1.96905    1.9686    1.9686    897.20
2007-03-30 16:03:00    1.9686    1.96900    1.9686    1.9690    1076.11
2007-03-30 16:04:00    1.9689    1.96910    1.9688    1.9690    849.70
2007-03-30 16:05:00    1.9690    1.96900    1.9688    1.9689    1402.80

我想增加一列。本专栏将:

  • 从这一点开始取15条记录(包括当前时间)
  • 从这15条记录中得到最大值bidhigh和最小值bidlow
  • 计算高低差,并在新列中使用该值

我尝试了以下方法。我首先读取数据。在

^{pr2}$

定义函数:

^{3}$

那么

df['newcolumn'] = map( lookaheadmaxmin, df[:15])

这个错误。我很确定映射中的“df[:15]”是个问题,因为我不知道如何将当前和未来记录的一部分传递给函数

基本上,我要做的是确定价格在15分钟的移动窗口内移动了多少,如下所示:

介于两者之间; 16: 00-16:15-价格变动了多少?把这个写在16点的记录栏里

16:01-16:16-价格变动了多少?把这个放在16:01记录栏里

16:02-16:17-价格变动了多少?把这个放在16:02记录栏里

16:03-16:18-价格变动了多少?把这个写在16:03的记录栏里

16:04-16:19-价格变动了多少?把这个放在16:04记录栏里

16:05-16:20-价格变动了多少?把这个写在16:05的记录栏里


其他信息:

我在Mac上使用的是Enthough Canopy版本1.1.0(64位)。 熊猫版本:版本:0.12.0-1(合并numpy 1.7.1)

源数据示例:

    currencypair,datetime,bidopen,bidhigh,bidlow,bidclose,askopen,askhigh,asklow,askclose,bidvolume,askvolume
    GBPUSD,2007-03-30 16:01:00,1.96870,1.96900,1.96860,1.96860,1.96850,1.96880,1.96845,1.96850,877.40,1386.70
    GBPUSD,2007-03-30 16:02:00,1.96860,1.96905,1.96860,1.96860,1.96850,1.96890,1.96840,1.96840,897.20,1272.30
    GBPUSD,2007-03-30 16:03:00,1.96860,1.96900,1.96860,1.96900,1.96850,1.96890,1.96840,1.96880,1076.11,1333.30
    GBPUSD,2007-03-30 16:04:00,1.96890,1.96910,1.96880,1.96900,1.96880,1.96890,1.96865,1.96880,849.70,765.10
    GBPUSD,2007-03-30 16:05:00,1.96900,1.96900,1.96880,1.96890,1.96875,1.96890,1.96860,1.96870,1402.80,1240.90
    GBPUSD,2007-03-30 16:06:00,1.96890,1.96890,1.96840,1.96860,1.96870,1.96870,1.96820,1.96850,769.50,1727.30
    GBPUSD,2007-03-30 16:07:00,1.96860,1.96880,1.96820,1.96830,1.96850,1.96870,1.96810,1.96820,842.00,1865.60
    GBPUSD,2007-03-30 16:08:00,1.96830,1.96930,1.96830,1.96910,1.96820,1.96920,1.96820,1.96890,1096.60,1197.70
    GBPUSD,2007-03-30 16:09:00,1.96910,1.96920,1.96880,1.96890,1.96895,1.96910,1.96865,1.96880,368.60,432.10

作为旁注-在记录的显示中有一些奇怪的东西(我使用的是ipython笔记本)。尽管我忽略了'currencypair'列,但奇怪的是它显示为列标题。(我把它包括在内,因为我不知道它是否与其他不起作用的事情有关。

导入数据(使用上面的csv\u read)(Note no 'currencypair'列命名)

^{pr2}$

然后做什么

    df[:5]

显示:(注意它将'currencypair'显示为列标题,但在下面的df.info()中,它只显示为'index'

                           bidopen    bidhigh    bidlow    bidclose    bidvolume
    currencypair                    
    2007-03-30 16:01:00     1.9687     1.96900     1.9686     1.9686     877.40
    2007-03-30 16:02:00     1.9686     1.96905     1.9686     1.9686     897.20
    2007-03-30 16:03:00     1.9686     1.96900     1.9686     1.9690     1076.11
    2007-03-30 16:04:00     1.9689     1.96910     1.9688     1.9690     849.70
    2007-03-30 16:05:00     1.9690     1.96900     1.9688     1.9689     1402.80

df.info()显示:

    <class 'pandas.core.frame.DataFrame'>
    Index: 2362159 entries, 2007-03-30 16:01:00 to 2013-09-02 18:59:00
    Data columns (total 5 columns):
    bidopen      2362159  non-null values
    bidhigh      2362159  non-null values
    bidlow       2362159  non-null values
    bidclose     2362159  non-null values
    bidvolume    2362159  non-null values
    dtypes: float64(5)

以另一种方式导入数据

导入并删除currencypair列;(注意添加'currencypair',然后在后面删除该列)

    usecols = ['currencypair','datetime','bidopen','bidhigh','bidlow','bidclose','bidvolume']
    df=pd.read_csv(path,parse_dates=('datetime'),index_col=1, usecols = usecols )
    df=df.drop('currencypair',1)

显示:

                           bidopen    bidhigh    bidlow    bidclose    bidvolume
    datetime                    
    2007-03-30 16:01:00     1.9687     1.96900     1.9686     1.9686     877.40
    2007-03-30 16:02:00     1.9686     1.96905     1.9686     1.9686     897.20
    2007-03-30 16:03:00     1.9686     1.96900     1.9686     1.9690     1076.11
    2007-03-30 16:04:00     1.9689     1.96910     1.9688     1.9690     849.70
    2007-03-30 16:05:00     1.9690     1.96900     1.9688     1.9689     1402.80

并且df.info()显示:(注意索引现在显示为'DatetimeIndex'

    <class 'pandas.core.frame.DataFrame'>
    DatetimeIndex: 2362159 entries, 2007-03-30 16:01:00 to 2013-09-02 18:59:00
    Data columns (total 5 columns):
    bidopen      2362159  non-null values
    bidhigh      2362159  non-null values
    bidlow       2362159  non-null values
    bidclose     2362159  non-null values
    bidvolume    2362159  non-null values
    dtypes: float64(5)

Tags: 数据dfdatetime记录价格nullvaluesnon
1条回答
网友
1楼 · 发布于 2024-05-14 14:04:50

当您只需要指定几个列时非常简单 e、 g.a的最大值,b的最小值

In [65]: df = DataFrame(randn(100,4),columns=list('abcd'),
        index=date_range('20130101 16:00',periods=100,freq='T'))

In [66]: df.head(20)
Out[66]: 
                            a         b         c         d
2013-01-01 16:00:00  0.404056  0.115774 -0.202356  0.998315
2013-01-01 16:01:00 -0.231966  0.262609  1.192302 -0.702163
2013-01-01 16:02:00 -0.467005  0.744724 -0.871782 -0.308637
2013-01-01 16:03:00 -0.175704  0.036244  1.404604 -0.106320
2013-01-01 16:04:00  0.046306 -0.098140  0.535573 -0.306300
2013-01-01 16:05:00 -0.115620 -1.069991  0.790965 -0.504283
2013-01-01 16:06:00  1.496555  0.373582  1.028092 -0.816990
2013-01-01 16:07:00  0.432081  0.182106  0.115107  1.239192
2013-01-01 16:08:00 -0.245789 -2.030840  0.118330 -1.922616
2013-01-01 16:09:00 -0.358188 -0.121750  1.768505 -2.096908
2013-01-01 16:10:00 -1.634722 -0.808355 -0.773417  0.095078
2013-01-01 16:11:00 -0.396295  0.168568 -0.901945 -0.073811
2013-01-01 16:12:00 -1.364391  2.052481 -0.175291  0.927363
2013-01-01 16:13:00 -0.523331  0.042475  0.361593 -2.239468
2013-01-01 16:14:00  1.573967 -0.709043  0.551812  0.452311
2013-01-01 16:15:00  0.180578  0.846856 -2.304107 -1.283507
2013-01-01 16:16:00  0.065386  0.356015 -0.174369  1.167562
2013-01-01 16:17:00 -1.747416  1.279114  0.559075  0.200927
2013-01-01 16:18:00 -2.041764 -0.085398  2.032789  0.195671
2013-01-01 16:19:00 -0.639329  0.268832  0.394621 -0.271260

滚动函数从这一点开始计算,所以我们进行时间偏移(这只是改变索引) 使值对齐(与起点,而不是终点)

^{pr2}$

高低差只是

df['max_a'] - df['min_b']

似乎您的系列中有空白,请使用asfreq

In [16]: df = DataFrame(randn(10,2),columns=list('ab'),index=date_range('20130101 9:00',freq='T',periods=10))

In [17]: df
Out[17]: 
                            a         b
2013-01-01 09:00:00  0.516518 -1.497564
2013-01-01 09:01:00  1.747399  1.100530
2013-01-01 09:02:00 -0.223476 -0.682712
2013-01-01 09:03:00  0.343172 -0.341965
2013-01-01 09:04:00 -1.380057 -1.565732
2013-01-01 09:05:00 -2.156675  1.043532
2013-01-01 09:06:00 -1.237155 -0.219086
2013-01-01 09:07:00  1.626510 -0.596204
2013-01-01 09:08:00 -0.767588  0.496110
2013-01-01 09:09:00 -0.014556  0.012049

In [18]: df.index
Out[18]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 09:00:00, ..., 2013-01-01 09:09:00]
Length: 10, Freq: T, Timezone: None

In [19]: df.append(Series(name=[Timestamp('20130101 09:15')]))
Out[19]: 
                            a         b
2013-01-01 09:00:00  0.516518 -1.497564
2013-01-01 09:01:00  1.747399  1.100530
2013-01-01 09:02:00 -0.223476 -0.682712
2013-01-01 09:03:00  0.343172 -0.341965
2013-01-01 09:04:00 -1.380057 -1.565732
2013-01-01 09:05:00 -2.156675  1.043532
2013-01-01 09:06:00 -1.237155 -0.219086
2013-01-01 09:07:00  1.626510 -0.596204
2013-01-01 09:08:00 -0.767588  0.496110
2013-01-01 09:09:00 -0.014556  0.012049
2013-01-01 09:15:00       NaN       NaN

In [20]: df.append(Series(name=[Timestamp('20130101 09:15')])).index
Out[20]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 09:00:00, ..., 2013-01-01 09:15:00]
Length: 11, Freq: None, Timezone: None

In [21]: df.append(Series(name=[Timestamp('20130101 09:15')])).asfreq('T')
Out[21]: 
                            a         b
2013-01-01 09:00:00  0.516518 -1.497564
2013-01-01 09:01:00  1.747399  1.100530
2013-01-01 09:02:00 -0.223476 -0.682712
2013-01-01 09:03:00  0.343172 -0.341965
2013-01-01 09:04:00 -1.380057 -1.565732
2013-01-01 09:05:00 -2.156675  1.043532
2013-01-01 09:06:00 -1.237155 -0.219086
2013-01-01 09:07:00  1.626510 -0.596204
2013-01-01 09:08:00 -0.767588  0.496110
2013-01-01 09:09:00 -0.014556  0.012049
2013-01-01 09:10:00       NaN       NaN
2013-01-01 09:11:00       NaN       NaN
2013-01-01 09:12:00       NaN       NaN
2013-01-01 09:13:00       NaN       NaN
2013-01-01 09:14:00       NaN       NaN
2013-01-01 09:15:00       NaN       NaN

In [22]: df.append(Series(name=[Timestamp('20130101 09:15')])).asfreq('T').index
Out[22]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 09:00:00, ..., 2013-01-01 09:15:00]
Length: 16, Freq: T, Timezone: None

相关问题 更多 >

    热门问题