使用python将一列转换为多行

>> Original Data Student Id Name Marks 0 id_1 John 112 1 id_2 Rafs 181 2 id_2 Rafs 182 3 id_2 Rafs 183 4 id_3 Juan 222 5 id_3 Juan 312 6 id_3 Roller 21

>> Expected Output Student Id Name Marks1 Marks2 Marks3 0 id_1 John 112 <NA> <NA> 1 id_2 Rafs 181 182 183 2 id_3 Juan 222 312 <NA> 3 id_3 Roller 21 <NA> <NA>

import pandas as pd data = [ ["id_1", 'John', 112], ["id_2", 'Rafs', 181], ["id_2", 'Rafs', 182], ["id_2", 'Rafs', 183], ["id_3", 'Juan', 222], ["id_3", 'Juan', 312], ["id_3", 'Roller', 21] ] df = pd.DataFrame(data, columns = ['Student Id', 'Name', 'Marks'])

3条回答

网友
1楼 · 编辑于 2024-06-16 09:42:50

下面是一个非常容易理解的答案，无需创建额外的列
#Grouping by Studend Id and Name new_df = df.groupby(['Student Id','Name'])['Marks'].apply(list).reset_index() #Now, in the marks columns, we have a list (as seen below). #So, we convert the list into different columns, using pd.Series # Student Id Name Marks # 0 id_1 John [112] # 1 id_2 Rafs [181, 182, 183] # 2 id_3 Juan [222, 312] # 3 id_4 Roller [21] temp_df = new_df['Marks'].apply(pd.Series) #Now, this is all decorative stuff. #Converting the column names from 0,1,2 to Marks1, Marks2, Marks3 temp_df.columns = list(map(lambda x: 'Marks'+str(x+1), temp_df.columns)) # Assigning this new temporary df to the original df new_df[temp_df.columns] = temp_df #Dropping the Marks column final_df = new_df.drop('Marks',axis=1) print(final_df)
输出：
Student Id Name Marks1 Marks2 Marks3 0 id_1 John 112.0 NaN NaN 1 id_2 Rafs 181.0 182.0 183.0 2 id_3 Juan 222.0 312.0 NaN 3 id_4 Roller 21.0 NaN NaN

网友
2楼 · 编辑于 2024-06-16 09:42:50

另一种方式：
temp = df.groupby(["Student Id", "Name"]).Marks.agg(list) out = (pd.DataFrame(temp.tolist(), index=temp.index) .rename(columns=lambda x: f"Marks{x+1}") .reset_index())
temp将是一个数据帧，每个(id, name)具有Marks的聚合列表。然后，我们在将列重命名为所需格式并重置索引以放置id&；的同时，从中形成一个数据帧；将名称添加到后面的列
得到
Student Id Name Marks1 Marks2 Marks3 0 id_1 John 112 NaN NaN 1 id_2 Rafs 181 182.0 183.0 2 id_3 Juan 222 312.0 NaN 3 id_3 Roller 21 NaN NaN

网友
3楼 · 编辑于 2024-06-16 09:42:50

使用^{}作为df3中创建的MultiIndex计数器列的新列：

df['g'] = df.groupby(['Student Id','Name']).cumcount().add(1)

df3 = (df.pivot_table(index=['Student Id','Name'], 
                     columns='g', 
                     values='Marks', 
                     aggfunc = 'max')
        .add_prefix('Marks')
        .rename_axis(None, axis=1)
        .reset_index())
print (df3)
  Student Id    Name  Marks1  Marks2  Marks3
0       id_1    John   112.0     NaN     NaN
1       id_2    Rafs   181.0   182.0   183.0
2       id_3    Juan   222.0   312.0     NaN
3       id_3  Roller    21.0     NaN     NaN

如果需要integers缺少值：

df['g'] = df.groupby(['Student Id','Name']).cumcount().add(1)

df3 = (df.pivot_table(index=['Student Id','Name'], 
                     columns='g', 
                     values='Marks', 
                     aggfunc = 'max')
        .add_prefix('Marks')
        .astype('Int64')
        .rename_axis(None, axis=1)
        .reset_index())
print (df3)
  Student Id    Name  Marks1  Marks2  Marks3
0       id_1    John     112    <NA>    <NA>
1       id_2    Rafs     181     182     183
2       id_3    Juan     222     312    <NA>
3       id_3  Roller      21    <NA>    <NA>

相关问题更多 >

编程相关推荐

热门问题

热门文章