如何用sql或python在dataframe中合并行并放入单行

2024-05-23 15:22:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我想根据与其他列的关系聚合特定列中的行,并创建包含json格式聚合数据的特定列。在

这就是一个例子。在

原始数据表

Child Name     Child Age    Father Name    Father Age
     Peter             5        Richard            40
     James            15           Doug            45
       Liz             2           Doug            45
      Paul             6        Richard            40
    Shirly            11        Charles            33
       Eva             9          Chris            29

转换后的数据表将是

^{pr2}$

或者

Father Name    Father Age     Children Name       Children Age
    Richard            40     {"Peter", "Paul"}      {"5","6"}
       Doug            45     {"James", "Liz"}      {"15","2"}
    Charles            33     {"Shirly"}                {"11"}
      Chris            29     {"Eva"}                    {"9"}

我的代码是

import pandas as pd
df = pd.DataFrame({
    "Child Name" : ["Peter","James","Liz","Paul","Shirly","Eva"],
    "Child Age" : ["5","15","2","6","11","9"],
    "Father Name" : ["Richard","Doug","Doug","Richard","Charles","Chris"],
    "Father Age" : ["40","45","45","40","33","29"] })

 print df

g1 = df.groupby(["Father Name"])["Child Name"].apply(", ".join).reset_index()
g1.columns = ['Father Name','Children Name']
print g1

输出将是

  Father Name   Children Name
0     Charles          Shirly
1       Chris             Eva
2        Doug      James, Liz
3     Richard     Peter, Paul

我想不出如何在列中添加“父亲年龄”和“儿童年龄”。 如何以最有效的方式在dataframe中转换它? 我希望避免通过python循环,因为它需要很长的时间来处理。在

谢谢


Tags: namechildrichardagechrispeterchildrencharles
1条回答
网友
1楼 · 发布于 2024-05-23 15:22:22

快速肮脏低效黑客攻击,但它避免了for循环。希望有一个更好的解决方案;我假设可以简化多个df拷贝和多个合并。在

import pandas as pd
df = pd.DataFrame({
    "Child Name" : ["Peter","James","Liz","Paul","Shirly","Eva"],
    "Child Age" : ["5","15","2","6","11","9"],
    "Father Name" : ["Richard","Doug","Doug","Richard","Charles","Chris"],
    "Father Age" : ["40","45","45","40","33","29"] })

g2 = df.groupby(['Father Name'])["Child Name"].apply(list).reset_index()
g3 = df.groupby(['Father Name'])["Child Age"].apply(list).reset_index()
g4 = df[["Father Name", "Father Age"]].drop_duplicates()

df2 = g2.merge(g4)
df2 = df2.merge(g3)
print(df2)

输出:

^{pr2}$

相关问题 更多 >