基于列计数的自定义分组

group = df.groupby('Type')['Sales'].avg() for name, group in tab_sales_per_machines: nmachines = group['Machine'].nunique() if nmachines < 5 : ... do stuff using df... else : group['Sales'].avg()

2条回答

网友

1楼 · 编辑于 2024-05-14 11:04:35

您可以尝试使用apply（以获得比agg多一点的灵活性）：

def your_func(group):
    nmachines = group.Machine.nunique()
    if nmachines < 5 :
        ... do stuff using df...
        return stuff
    # default is to return Sales avg
    return group.Sales.avg()

df.groupby('Type').apply(your_func)

网友

2楼 · 编辑于 2024-05-14 11:04:35

我设法解决了它通过循环对小组。我在这里发布我的解决方案。这是可行的，但似乎不是最优雅的方式。万一有人有更好的主意，我会很高兴听到的。注意：这个函数比这个复杂一点：我试着把它分解成需要理解的基本部分

def getSalesPerMachine(df) :

    groups  = df[['Type','Sales','Product Line','Machine']].groupby('Type', as_index=False)

    # Build the output table
    tab = groups.agg({'Machine':'nunique', 'Sales':'sum', 'Product Line' : 'max'})
    tab['Annual sales'] = np.nan  ## <  Create the column where I'll put the result.

    for name, group in groups:

        ## If stats is low use the full product line (rescaled)
        nmachines = group.Machine.nunique()

        if nmachines < 5 :

            # Retrieve the product line
            pl = group['Product Line'].max()

            ## Get all machines of that product line
            mypl = df.loc[df['Product Line'] == pl]

            ## Assign to sales the total of the PL rescales for how many machines of the specific type
            sales = mypl.Sales.sum() * nmachines /  mypl.Machine.nunique()

        else :
            # If high stats just return the sum plain and simple
            sales = group.Sales.sum() 

        # Save result (this was where I was stuck before)
        tab['Annual sales'] = \
            np.where(tab['Type']==name, annualSales, tab['Annual sales'])

    return tab

相关问题更多 >

编程相关推荐

热门问题

热门文章

基于列计数的自定义分组

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >