基于其他数据帧列值筛选数据帧

2024-06-02 06:10:26 发布

您现在位置:Python中文网/ 问答频道 /正文

df1:

Id   Country  Product
1    india    cotton
2    germany  shoes
3    algeria  bags

df2:

id   Country  Product  Qty   Sales
1    India    cotton   25    635
2    India    cotton   65    335
3    India    cotton   96    455
4    India    cotton   78    255
5    germany  shoes    25    635
6    germany  shoes    65    458
7    germany  shoes    96    455
8    germany  shoes    69    255
9    algeria  bags     25    635
10   algeria  bags     89    788
11   algeria  bags     96    455
12   algeria  bags     78    165

我需要根据df1中的国家和产品列过滤df2,并创建新的数据框架。 例如,在df1中,有3个独特的国家、类别,所以df的数量将是3

输出:

df_India_Cotton :

id   Country  Product  Qty   Sales
1    India    cotton   25    635
2    India    cotton   65    335
3    India    cotton   96    455
4    India    cotton   78    255

df_germany_Product:

id   Country  Product  Qty   Sales
1    germany  shoes    25    635
2    germany  shoes    65    458
3    germany  shoes    96    455
4    germany  shoes    69    255

df_algeria_Product:

id  Country  Product  Qty   Sales
1   algeria  bags     25    635
2   algeria  bags     89    788
3   algeria  bags     96    455
4   algeria  bags     78    165

我还可以用pandas中的基本子集过滤掉这些数据帧

df[(df.Country=='India') & (df.Products=='cotton')]

这将解决这个问题,在我的df1中可能有如此多的国家、产品的独特组合


Tags: iddf国家productcountrybagsqtydf1
2条回答

尝试创建两个groupby。使用第一个选项从第二个选项中选择:

import pandas as pd

selector_df = pd.DataFrame(data=
                           {
                               'Country':'india germany algeria'.split(),
                               'Product':'cotton shoes bags'.split()
                           })

details_df = pd.DataFrame(data=
                         {
                            'Country':'india india india india germany germany germany germany algeria algeria algeria algeria'.split(),
                            'Product':'cotton cotton cotton cotton shoes shoes shoes shoes bags bags bags bags'.split(),
                            'qty':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
                         })

selectorgroups = selector_df.groupby(by=['Country', 'Product'])
datagroups = details_df.groupby(by=['Country', 'Product'])
for tag, group in selectorgroups:
    print(tag)
    try:
        print(datagroups.get_group(tag))
    except KeyError:
        print('tag does not exist in datagroup')

您可以创建一个字典并在其中保存所有数据帧。 检查以下代码:

d={}
for i in range(len(df1)):
    name=df1.Country.iloc[i]+'_'+df1.Product.iloc[i]
    d[name]=df2[(df2.Country==df1.Country.iloc[i]) & (df2.Product==df1.Product.iloc[i])]

您可以通过其值调用每个数据帧,如下所示:

d['India_cotton']将提供:

id   Country  Product  Qty   Sales
1    India    cotton   25    635
2    India    cotton   65    335
3    India    cotton   96    455
4    India    cotton   78    255

相关问题 更多 >