Pandas列的归约操作

6 投票

3 回答

6498 浏览

提问于 2025-04-17 13:47

我想把一列（其实是很多列）收益数据转换成收盘价的列。在Clojure中，我会用reductions这个函数，它和reduce类似，但会返回所有中间值的序列。

比如：

$ c

0.12
-.13
0.23
0.17
0.29
-0.11

# something like this
$ c.reductions(init=1, lambda accumulator, ret: accumulator * (1 + ret)) 

1.12
0.97
1.20
1.40
1.81
1.61

注意：实际的收盘价并不重要，所以我用1作为初始值。我只需要一个“假”的收盘价。

我的数据实际上是一个包含命名列的时间序列的DataFrame。我想找一个类似applymap的函数，但我不想用那种方法去引用DataFrame（我想这也是解决这个问题的一种方法吧？）

另外，如果我想保留returns数据，同时又想有收盘“价格”，我该怎么做呢？我是不是应该返回一个元组，让时间序列的类型变成(returns, closing_price)？

数据处理时间序列数据帧 clojure 数据保留归约操作收益数据收盘价

3 个回答

为了让代码更容易读懂，我更喜欢下面这个解决方案：

returns = pd.Series([0.12, -.13, 0.23, 0.17, 0.29, -0.11])

initial_value = 100
cum_growth = initial_value * (1 + returns).cumprod()

>>> cum_growth
0    112.000000
1     97.440000
2    119.851200
3    140.225904
4    180.891416
5    160.993360
dtype: float64

如果你想在序列中包含初始值的话：

>>> pd.concat([pd.Series(initial_value), cum_growth]).reset_index(drop=True)
0    100.000000
1    112.000000
2     97.440000
3    119.851200
4    140.225904
5    180.891416
6    160.993360
dtype: float64

回答于 2025-04-17 由 Python大师

分享举报

这个功能看起来还没有被广泛宣传，但你可以使用 expanding_apply 来计算收益。

In [1]: s
Out[1]:
0    0.12
1   -0.13
2    0.23
3    0.17
4    0.29
5   -0.11

In [2]: pd.expanding_apply(s ,lambda s: reduce(lambda x, y: x * (1+y), s, 1))

Out[2]:
0    1.120000
1    0.974400
2    1.198512
3    1.402259
4    1.808914
5    1.609934

我不是百分之百确定，但我认为 expanding_apply 是从第一个索引开始，直到当前索引，对应用的序列进行操作。我使用内置的 reduce 函数，它的工作方式和你的 Clojure 函数完全一样。

expanding_apply 的文档说明：

Generic expanding function application

Parameters
----------
arg : Series, DataFrame
func : function
    Must produce a single value from an ndarray input
min_periods : int
    Minimum number of observations in window required to have a value
freq : None or string alias / date offset object, default=None
    Frequency to conform to before computing statistic
center : boolean, default False
    Whether the label should correspond with center of window

Returns
-------
y : type of input argument

回答于 2025-04-17 由 Python大师

分享举报

值得注意的是，在使用pandas时，写得详细一些通常会更快，也更容易理解，而不是使用reduce这种方式。

在你具体的例子中，我会直接使用add，然后再用cumprod：

In [2]: c.add(1).cumprod()
Out[2]: 
0    1.120000
1    0.974400
2    1.198512
3    1.402259
4    1.808914
5    1.609934

或者你可以试试init * c.add(1).cumprod()。

注意：不过在某些情况下，比如内存有限时，你可能需要用更底层或者更聪明的方式来重写这些代码，但通常先尝试最简单的方法是值得的（可以用%timeit或者内存分析来测试一下）。

回答于 2025-04-17 由 Python大师

分享举报

Pandas列的归约操作

3 个回答

撰写回答