如何根据dataframe中的列值进行分类，然后对总购买量求和？

df = {'product_type_name': ['Calendar', 'Lanyard', 'Name Card', 'Paper Lunch Box', 'Plastic Cup', 'Poster', 'Sticker', 'T-Shirt', 'Tote Bag'], 'order_count': [4, 44, 14, 8, 6, 39, 28, 28, 17]} df = pd.DataFrame(df) print(df)

2条回答

网友

1楼 · 编辑于 2024-05-23 21:59:49

步骤

创建mapping dict反向展开并将product_type_name映射到其类别
使用pd.cut创建high/medium/low标签
使用pivot_table和aggfunc=''.join重组df

d = {'Packaging ': ['Paper Lunch Box', 'Plastic Cup'],
     'Marketing Materials': ['Poster', 'Sticker'],
     'Office Supplies': ['Name Card', 'Calendar', 'Lanyard'],
     'Merchandise': ['Tote Bag', 'T-Shirt']}


df['category'] = df['product_type_name'].map(
    {i: k for k, v in d.items() for i in v})

df['rules'] = pd.cut(df.order_count, bins=[0, 5, 9, np.inf],
                     labels=['Low', 'Medium', 'High'])

df = df.pivot_table(index='category', columns='rules',
                    values='product_type_name', aggfunc=', '.join)

输出：

rules                     Low                        Medium  \
category                                                      
Marketing Materials       NaN                           NaN   
Merchandise               NaN                           NaN   
Office Supplies      Calendar                           NaN   
Packaging                 NaN  Paper Lunch Box, Plastic Cup   

rules                              High  
category                                 
Marketing Materials     Poster, Sticker  
Merchandise           T-Shirt, Tote Bag  
Office Supplies      Lanyard, Name Card  
Packaging                           NaN

网友

2楼 · 编辑于 2024-05-23 21:59:49

下面是执行此操作的代码：

#Creating the dataframe
df=pd.DataFrame()
df['product_type_name']=['Calendar','Lanyard','Name Card',
                         'Paper Lunch Box','Plastic Cup','Poster'
                         ,'Sticker','T-Shirt','Tote Bag']
df['order_count']=[4,44,14,8,6,39,28,28,17]

#add the categories for each set
df.loc[df.product_type_name.isin(['Paper Lunch Box','Plastic Cup'])
       ,'category']=['Packaging']
df.loc[df.product_type_name.isin(['Sticker','Poster'])
       ,'category']=['Marketing Materials']
df.loc[df.product_type_name.isin(['Name Card','Calendar','Lanyard'])
       ,'category']=['Office Supplies']
df.loc[df.product_type_name.isin(['Tote Bag','T-Shirt'])
       ,'category']=['Merchandise']
#add high, medium, low 
df.loc[df.order_count<=5,'order_volume']=['low']
df.loc[(df.order_count>5)&(df.order_count<10),'order_volume']=['medium']
df.loc[df.order_count>=10,'order_volume']=['high']
#use pivot table to split the order_volume column and join the names
pd.pivot_table(df, values=['product_type_name'],
               index=['category'],
               columns=['order_volume'],
               aggfunc=lambda x: ','.join(str(v) for v in x))

步骤

输出：

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何根据dataframe中的列值进行分类，然后对总购买量求和？

步骤

输出：

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >