如何根据dataframe中的列值进行分类,然后对总购买量求和?

2024-05-23 21:59:49 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样一个数据帧:

df = {'product_type_name': 
        ['Calendar', 'Lanyard', 'Name Card', 'Paper Lunch Box', 'Plastic Cup', 'Poster', 'Sticker', 'T-Shirt', 'Tote Bag'], 
       'order_count': 
        [4, 44, 14, 8, 6, 39, 28, 28, 17]}

df = pd.DataFrame(df)
print(df)

输出:

output dataframe

我想将每个product_type_name分成如下四类:

  • 包装(纸质午餐盒、塑料杯)
  • 营销材料(海报、贴纸)
  • 办公用品(名片、日历、挂绳)
  • 商品(手提包、T恤)

之后,我想根据以下规则总结每个类别的总订单:

  • 高(订单>;=10)
  • 中等(订单6-9)
  • 低(阶数<;=5)

预期输出如下所示:

^{tb1}$

我的解决方案是首先根据上面的顺序规则制作一个包含3类的列:高、中、低。然后,制作汇总表。问题是我不知道如何做汇总表。你知道怎么帮我解决这个问题吗

编辑

我制作了python实时代码:https://paiza.io/projects/uhBOkwo5OZkOx4eg6bdSCw


Tags: 数据name订单boxdf规则typeproduct
2条回答

步骤

  1. 创建mapping dict反向展开并将product_type_name映射到其类别

  2. 使用pd.cut创建high/medium/low标签

  3. 使用pivot_tableaggfunc=''.join重组df

d = {'Packaging ': ['Paper Lunch Box', 'Plastic Cup'],
     'Marketing Materials': ['Poster', 'Sticker'],
     'Office Supplies': ['Name Card', 'Calendar', 'Lanyard'],
     'Merchandise': ['Tote Bag', 'T-Shirt']}


df['category'] = df['product_type_name'].map(
    {i: k for k, v in d.items() for i in v})

df['rules'] = pd.cut(df.order_count, bins=[0, 5, 9, np.inf],
                     labels=['Low', 'Medium', 'High'])

df = df.pivot_table(index='category', columns='rules',
                    values='product_type_name', aggfunc=', '.join)

输出:

rules                     Low                        Medium  \
category                                                      
Marketing Materials       NaN                           NaN   
Merchandise               NaN                           NaN   
Office Supplies      Calendar                           NaN   
Packaging                 NaN  Paper Lunch Box, Plastic Cup   

rules                              High  
category                                 
Marketing Materials     Poster, Sticker  
Merchandise           T-Shirt, Tote Bag  
Office Supplies      Lanyard, Name Card  
Packaging                           NaN  

下面是执行此操作的代码:

#Creating the dataframe
df=pd.DataFrame()
df['product_type_name']=['Calendar','Lanyard','Name Card',
                         'Paper Lunch Box','Plastic Cup','Poster'
                         ,'Sticker','T-Shirt','Tote Bag']
df['order_count']=[4,44,14,8,6,39,28,28,17]

#add the categories for each set
df.loc[df.product_type_name.isin(['Paper Lunch Box','Plastic Cup'])
       ,'category']=['Packaging']
df.loc[df.product_type_name.isin(['Sticker','Poster'])
       ,'category']=['Marketing Materials']
df.loc[df.product_type_name.isin(['Name Card','Calendar','Lanyard'])
       ,'category']=['Office Supplies']
df.loc[df.product_type_name.isin(['Tote Bag','T-Shirt'])
       ,'category']=['Merchandise']
#add high, medium, low 
df.loc[df.order_count<=5,'order_volume']=['low']
df.loc[(df.order_count>5)&(df.order_count<10),'order_volume']=['medium']
df.loc[df.order_count>=10,'order_volume']=['high']
#use pivot table to split the order_volume column and join the names
pd.pivot_table(df, values=['product_type_name'],
               index=['category'],
               columns=['order_volume'],
               aggfunc=lambda x: ','.join(str(v) for v in x))

enter image description here

相关问题 更多 >