使用python将一列转换为多行

2024-06-16 09:42:50 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个包含3列的数据集“df”

>> Original Data

    Student Id    Name  Marks
0       id_1    John    112
1       id_2    Rafs    181
2       id_2    Rafs    182
3       id_2    Rafs    183
4       id_3    Juan    222
5       id_3    Juan    312
6       id_3  Roller     21

尝试保持“学生Id”和“姓名”列不变,但将“标记”转换为多个列。这样,每个唯一的“学生Id”和“姓名”将有一行所有可能的标记。此外,我们不需要手动创建列,但它应该根据值动态创建

>> Expected Output

    Student Id    Name  Marks1  Marks2  Marks3
0       id_1    John     112    <NA>    <NA>
1       id_2    Rafs     181     182     183
2       id_3    Juan     222     312    <NA>
3       id_3  Roller      21    <NA>    <NA>

复制输入的样本数据

import pandas as pd

data = [
    ["id_1", 'John', 112],
    ["id_2", 'Rafs', 181],
    ["id_2", 'Rafs', 182],
    ["id_2", 'Rafs', 183], 
    ["id_3", 'Juan', 222],
    ["id_3", 'Juan', 312],
    ["id_3", 'Roller', 21]
]
df = pd.DataFrame(data, columns = ['Student Id', 'Name', 'Marks'])

我尝试了下面的但是没有得到想要的输出。它在括号()中给出了结果,并且缺少标记

df3 = df.pivot_table(index=['Student Id','Name'], columns='Marks', aggfunc = 'max')

>>Output
Empty DataFrame
Columns: []
Index: [(id_1, John), (id_2, Rafs), (id_3, Juan), (id_3, Roller)]

Tags: 数据name标记iddfrollerjohnstudent
3条回答

下面是一个非常容易理解的答案,无需创建额外的列

#Grouping by Studend Id and Name
new_df = df.groupby(['Student Id','Name'])['Marks'].apply(list).reset_index()

#Now, in the marks columns, we have a list (as seen below).
#So, we convert the list into different columns, using pd.Series

#   Student Id  Name    Marks
# 0 id_1    John    [112]
# 1 id_2    Rafs    [181, 182, 183]
# 2 id_3    Juan    [222, 312]
# 3 id_4    Roller  [21]
temp_df = new_df['Marks'].apply(pd.Series)

#Now, this is all decorative stuff. 

#Converting the column names from 0,1,2 to Marks1, Marks2, Marks3
temp_df.columns = list(map(lambda x: 'Marks'+str(x+1), temp_df.columns))

# Assigning this new temporary df to the original df
new_df[temp_df.columns] = temp_df

#Dropping the Marks column
final_df = new_df.drop('Marks',axis=1)
print(final_df)

输出:

    Student Id  Name    Marks1  Marks2  Marks3
0   id_1    John    112.0   NaN NaN
1   id_2    Rafs    181.0   182.0   183.0
2   id_3    Juan    222.0   312.0   NaN
3   id_4    Roller  21.0    NaN NaN

另一种方式:

temp = df.groupby(["Student Id", "Name"]).Marks.agg(list)

out = (pd.DataFrame(temp.tolist(), index=temp.index)
           .rename(columns=lambda x: f"Marks{x+1}")
           .reset_index())

temp将是一个数据帧,每个(id, name)具有Marks的聚合列表。然后,我们在将列重命名为所需格式并重置索引以放置id&;的同时,从中形成一个数据帧;将名称添加到后面的列

得到

  Student Id    Name  Marks1  Marks2  Marks3
0       id_1    John     112     NaN     NaN
1       id_2    Rafs     181   182.0   183.0
2       id_3    Juan     222   312.0     NaN
3       id_3  Roller      21     NaN     NaN

使用^{}作为df3中创建的MultiIndex计数器列的新列:

df['g'] = df.groupby(['Student Id','Name']).cumcount().add(1)

df3 = (df.pivot_table(index=['Student Id','Name'], 
                     columns='g', 
                     values='Marks', 
                     aggfunc = 'max')
        .add_prefix('Marks')
        .rename_axis(None, axis=1)
        .reset_index())
print (df3)
  Student Id    Name  Marks1  Marks2  Marks3
0       id_1    John   112.0     NaN     NaN
1       id_2    Rafs   181.0   182.0   183.0
2       id_3    Juan   222.0   312.0     NaN
3       id_3  Roller    21.0     NaN     NaN

如果需要integers缺少值:

df['g'] = df.groupby(['Student Id','Name']).cumcount().add(1)

df3 = (df.pivot_table(index=['Student Id','Name'], 
                     columns='g', 
                     values='Marks', 
                     aggfunc = 'max')
        .add_prefix('Marks')
        .astype('Int64')
        .rename_axis(None, axis=1)
        .reset_index())
print (df3)
  Student Id    Name  Marks1  Marks2  Marks3
0       id_1    John     112    <NA>    <NA>
1       id_2    Rafs     181     182     183
2       id_3    Juan     222     312    <NA>
3       id_3  Roller      21    <NA>    <NA>

相关问题 更多 >