Python Pandas：groupby 应用函数查看前面的行

1 投票

1 回答

622 浏览

提问于 2025-04-18 12:49

我有一个数据集，我想在里面加一列，用来表示一些复杂计算的结果。这个计算需要在每个组内进行，而且每一行的值都依赖于它上面的行。下面是我目前写的简单代码和我想要的输出结果：

编辑 1 我更新了下面的代码，可能我对 apply 的用法理解得不太对，我以为这个会执行两次（每个组一次）。然后，我的函数会在这些执行中循环遍历每一行。我还是搞不懂为什么它执行了三次……我原以为“执行”会打印五次。你们怎么看？

编辑 2 我在返回函数的缩进上搞错了。现在修好了。谢谢大家的帮助！

import pandas as pd

df = pd.DataFrame({'type' : ['foo', 'foo', 'foo', 'bar','bar'], 'cost' : [1, 4, 2, 8,9]})
df['class'] = np.nan

def customFunction(test_df):
    print np.shape(test_df)
    iteration = 1
    for currRow in test_df.iterrows():
        print 'executed'
        if iteration == 1:
            test_df['class'] = 'first'
        else:
            if currRow[1]['cost'] > priorCost:
                test_df['class'] = 'greater'
            elif currRow[1]['cost'] < priorCost:
                test_df['class'] = 'less'
            else:
                test_df['class'] = 'equal'

        iteration += 1        
        priorCost = currRow[1]['cost']

    return test_df

grouped_df = df.groupby(['type']).apply(customFunction)

输出：

(2, 2)
executed
(2, 2)
executed
(3, 2)
executed
   cost type  class
0     1  foo  first
1     4  foo  first
2     2  foo  first
3     8  bar  first
4     9  bar  first

数据处理 groupby 数据分析复杂计算 pandas库数据集 apply函数行依赖计算

1 个回答

我先把我目前的进展告诉你——我现在需要稍微休息一下，不过：

df = pd.DataFrame(pd.read_clipboard())
df.set_index('type', inplace=True)
test = df.groupby(level=0).apply(lambda x: x.cost.diff())

这段代码给我的结果是（因为 diff() 是计算每一列相对于第一行的差异）

Out[160]: 
type
bar     type
bar    NaN
bar      1
Name: cost, dtype: ...
foo     type
foo    NaN
foo      3
foo     -2
Name: co...
dtype: object

所以这包含了你需要的所有信息。现在，我在把这些信息合并回原来的数据框时遇到了麻烦。 df['differences'] = test 结果一团糟。

更新

我快搞定了：

>>> df['differences'] = test[1].append(test[0])
>>> df.loc[df['differences'] > 0, 'inWords'] = 'greater'   
>>> df.loc[df['differences'] < 0, 'inWords'] = 'lesser' 
>>> df.loc[df['differences'].isnull(), 'inWords'] = 'first' 
>>> df
Out[184]: 
      cost  differences  inWords
type                            
foo      1          NaN    first
foo      4            3  greater
foo      2           -2   lesser
bar      8          NaN    first
bar      9            1  greater

所以，现在只需要一个通用的表达式来替代 test[1].append(test[0])。也许其他人可以帮忙？

更新 2

回应你的评论：每当你为 apply() 定义函数时，

def compareSomethingWithinAGroup(group):
    someMagicHappens()
    return someValues

你可以使用所有标准的 pandas 函数，并且在函数内部可以访问整个组。所以，你可以创建所有复杂的、依赖于行的操作，无论是什么。你只需要注意： someValues 需要是一个只有一列的 Series 或 dataframe，它的条目数量要和 group 的行数一致。只要你返回这样的 someValues，你就可以随时使用 df['resultOfSomethingComplicated'] = df.groupby(level=0).apply(compareSomethingWithinAGroup)，并在你的响应中使用所有行。

回答于 2025-04-18 由 Python大师

分享举报

Python Pandas：groupby 应用函数查看前面的行

1 个回答

撰写回答