根据当前df中的列创建新df

2条回答

网友

1楼 · 编辑于 2024-05-20 01:07:15

我们可以用^{}应用这些公式：

每个组返回一个Volume/Cost数据帧
或者返回一系列Cost元组和^{}元组

数据帧选项

首先将数字字符串转换为实际数字（或者，如果正在使用^{}加载数据，请使用thousands参数）：

df_1['Volume'] = df_1['Volume'].str.replace(',', '').astype(int)
df_1['Order Cost'] = df_1['Order Cost'].str.replace(r'[$,]', '', regex=True).astype(int)

给定Group/Month/ID组，将其Volume和Cost作为数据帧返回：

def formulae_df(g):
    # set index to Cost Type for simpler indexing
    g = g.set_index('Cost Type')

    # initialize empty result df
    df = pd.DataFrame(columns=['Volume', 'Cost'], index=['Freight', 'FOB', 'Price']).rename_axis('Cost Type')

    # fill result df with forumlae
    df['Volume'] = g.loc['FOB', 'Volume']
    df.loc['Freight', 'Cost'] = abs(g.loc['Customer Backhaul', 'Order Cost']) + g.loc['Vendor Freight - Delivered', 'Order Cost']
    df.loc['FOB', 'Cost'] = g.loc['FOB', 'Order Cost']
    df.loc['Price', 'Cost'] = g.loc['Price', 'Order Cost'] - g.loc['Customer Backhaul', 'Order Cost']
    return df

然后用^{}应用formulae_df：

df_2 = df_1.groupby(['Group', 'Month', 'ID']).apply(formulae_df).reset_index()

#   Group     Month     ID  Cost Type  Volume      Cost
# 0     A  1/1/2021  SKU_1    Freight   75357    116570
# 1     A  1/1/2021  SKU_1        FOB   75357  12407112
# 2     A  1/1/2021  SKU_1      Price   75357  12458212
# 3     B  1/1/2021  SKU_1    Freight  931866   1378414
# 4     B  1/1/2021  SKU_1        FOB  931866  50059515
# 5     B  1/1/2021  SKU_1      Price  931866  62490987

带有^{}

由于每个组都有一个Volume和多个Cost，因此我们可以将Cost生成为列表/元组和^{}它们：

第一步仍然是将数字字符串转换为实际数字：

df_1['Volume'] = df_1['Volume'].str.replace(',', '').astype(int)
df_1['Order Cost'] = df_1['Order Cost'].str.replace(r'[$,]', '', regex=True).astype(int)

给定一个Group/Month/ID组，计算其Volume（值）和Cost（元组）：

def formulae_series(g):
    # set index for easy loc access
    g = g.set_index('Cost Type')

    # compute formulae
    volume = g.loc['FOB', 'Volume']
    costs = {
        'Freight': abs(g.loc['Customer Backhaul', 'Order Cost']) + g.loc['Vendor Freight - Delivered', 'Order Cost'],
        'FOB': g.loc['FOB', 'Order Cost'],
        'Price': g.loc['Price', 'Order Cost'] - g.loc['Customer Backhaul', 'Order Cost'],
    }

    # return volume as a value and costs as tuples
    return pd.Series({'Cost Type': costs.keys(), 'Volume': volume, 'Cost': costs.values()})

当我们用^{}应用formulae_series时，请注意Cost*列如何包含元组：

df_2 = df_1.groupby(['Group', 'Month', 'ID']).apply(formulae_series)

#                                   Cost Type  Volume                           Cost
# Group Month    ID
# A     1/1/2021 SKU_1  (Freight, FOB, Price)   75357   (116570, 12407112, 12458212)
# B     1/1/2021 SKU_1  (Freight, FOB, Price)  931866  (1378414, 50059515, 62490987)

现在^{}将这些元组分成行：

df_2 = df_2.explode(['Cost Type', 'Cost']).reset_index()

#   Group     Month     ID  Cost Type  Volume      Cost
# 0     A  1/1/2021  SKU_1    Freight   75357    116570
# 1     A  1/1/2021  SKU_1        FOB   75357  12407112
# 2     A  1/1/2021  SKU_1      Price   75357  12458212
# 3     B  1/1/2021  SKU_1    Freight  931866   1378414
# 4     B  1/1/2021  SKU_1        FOB  931866  50059515
# 5     B  1/1/2021  SKU_1      Price  931866  62490987

完整代码

以下是重新组合的步骤（包括可选转换回逗号/美元）：

## load df_1
df_1 = pd.DataFrame([['A','1/1/2021','SKU_1','Customer Backhaul','34,848','$-51,100'],['A','1/1/2021','SKU_1','FOB','75,357','$12,407,112'],['A','1/1/2021','SKU_1','Price','75,357','$12,407,112'],['A','1/1/2021','SKU_1','Vendor Freight - Delivered','40,511','$65,470'],['B','1/1/2021','SKU_1','Customer Backhaul','197,904','$-157,487'],['B','1/1/2021','SKU_1','FOB','931,866','$50,059,515'],['B','1/1/2021','SKU_1','Price','931,866','$62,333,500'],['B','1/1/2021','SKU_1','Vendor Freight - Delivered','740,355','$1,220,927']],columns=['Group','Month','ID','Cost Type','Volume','Order Cost'])

## convert to numerics
df_1['Volume'] = df_1['Volume'].str.replace(',', '').astype(int)
df_1['Order Cost'] = df_1['Order Cost'].str.replace(r'[$,]', '', regex=True).astype(int)

## dataframe option
df_2 = df_1.groupby(['Group', 'Month', 'ID']).apply(formulae_df).reset_index()

## or apply formulae and explode costs
# df_2 = (df_1.groupby(['Group', 'Month', 'ID'])
#             .apply(formulae_series)
#             .explode(['Cost Type', 'Cost'])
#             .reset_index())

## optional: revert to comma/dollar strings
df_2['Volume'] = df_2['Volume'].map('{:,}'.format)
df_2['Cost'] = df_2['Cost'].map('${:,}'.format)

输出：

  Group     Month     ID  Cost Type   Volume         Cost
0     A  1/1/2021  SKU_1    Freight   75,357     $116,570
1     A  1/1/2021  SKU_1        FOB   75,357  $12,407,112
2     A  1/1/2021  SKU_1      Price   75,357  $12,458,212
3     B  1/1/2021  SKU_1    Freight  931,866   $1,378,414
4     B  1/1/2021  SKU_1        FOB  931,866  $50,059,515
5     B  1/1/2021  SKU_1      Price  931,866  $62,490,987

网友

2楼 · 编辑于 2024-05-20 01:07:15

我发现当它是一个计算时，以及当您只想创建一个包含计算的文本时，有点难以理解，数字来自何处等等

在我理解了这个问题后，我在下面提供了一个例子（如果我误解了，请随意评论）：

您可以在每一行上循环，然后对值进行计算（并对公式使用replace函数）

replace_dict = {"Freight":"ABS(Backhaul)+Vendor Delivered", "Price":"Price - Backhaul",...,}
df["Definition or Formulae"] = df["Cost Type"].replace(replace_dict) #Replace "Freight with "ABS(Backhaul)+Vendor" etc in a new column called "Definition and Formulae


#Do some calculations with the values in each row
def get_value(row):
   cost_type = row["Cost Type"]
   if cost_type == "Freight":
        return row["Backhaul"].abs()+ row["Vendor Delivered"]
   if cost_type== "Price":
        return row["Price"] - row["Backhaul"]
   .
   .
   return row["Value"]  # If nothing special to do, just return the value
df["Value"] = df.apply(get_value, axis=1)

数据帧选项

带有^{}

完整代码

相关问题更多 >

编程相关推荐

热门问题

热门文章