基于条件创建具有新列名的数据透视表

user_id AddressLine1 AddressLine2 City address_type 0 u68472 PO Box 354 None LOREDO Mailing 1 u68472 154 Cedar Park Apt. 39 LOREDO Residential

2条回答

网友

1楼 · 编辑于 2024-05-16 00:21:27

您可以基于地址类型创建两个数据帧，然后在用户id上连接它们，并为重叠列设置rsuffix

mailing = df.loc[df.address_type == 'Mailing', :].set_index('user_id')

residential = df.loc[df.address_type == 'Residential', :].set_index('user_id')

new_df = mailing.join(residential, rsuffix='_RES')

输出：

        AddressLine1 AddressLine2    City address_type AddressLine1_RES AddressLine2_RES City_RES address_type_RES
user_id                                                                                                           
u68472    PO Box 354         None  LOREDO      Mailing   154 Cedar Park          Apt. 39   LOREDO      Residential

网友

2楼 · 编辑于 2024-05-16 00:21:27

或者，您可以使用多级索引：

Python 3.7.5 (default, Oct 31 2019, 15:18:51) [MSC v.1916 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.10.2   An enhanced Interactive Python. Type '?' for help.

In [1]: import pandas as pd

In [2]: df = pd.DataFrame([['u68472','PO Box 354',None,'LOREDO','Mailing'],^M
   ...:                   ['u68472','154 Cedar Park','Apt. 39','LOREDO','Residential']], ^M
   ...:                  columns = ['user_id','AddressLine1','AddressLine2','City','address_type'])

In [3]: df
Out[3]:
  user_id    AddressLine1 AddressLine2    City address_type
0  u68472      PO Box 354         None  LOREDO      Mailing
1  u68472  154 Cedar Park      Apt. 39  LOREDO  Residential

In [4]: new_df = df.set_index(['user_id', 'address_type']).unstack('address_type')

In [5]: new_df
Out[5]:
             AddressLine1                 AddressLine2                City
address_type      Mailing     Residential      Mailing Residential Mailing Residential
user_id
u68472         PO Box 354  154 Cedar Park         None     Apt. 39  LOREDO      LOREDO

In [6]: new_df.columns = new_df.columns.get_level_values(0).str.cat(new_df.columns.get_level_values('address_type').values, sep='_')

In [7]: new_df
Out[7]:
        AddressLine1_Mailing AddressLine1_Residential AddressLine2_Mailing AddressLine2_Residential City_Mailing City_Residential
user_id
u68472            PO Box 354           154 Cedar Park                 None                  Apt. 39       LOREDO           LOREDO

相关问题更多 >

编程相关推荐

热门问题

热门文章

基于条件创建具有新列名的数据透视表

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >