有条件地合并数据帧的连续行

2024-05-29 05:23:36 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个输入数据框,它包含以下内容:

NAME    TEXT
Tim     Tim Wagner is a teacher.
Tim     He is from Cleveland, Ohio.
Frank   Frank is a musician.
Tim     He like to travel with his family
Frank   He is a performing artist who plays the cello.
Frank   He performed at the Carnegie Hall last year.
Frank   It was fantastic listening to him.

如果名称列的连续行具有相同的值,我希望连接文本列

输出数据帧:

NAME    TEXT
Tim     Tim Wagner is a teacher.  He is from Cleveland, Ohio.
Frank   Frank is a musician
Tim     He like to travel with his family
Frank   He is a performing artist who plays the cello. He performed at the Carnegie Hall last year. It was fantastic listening to him.

使用pandas shift是最好的方法吗?谢谢你的帮助

谢谢


Tags: theto数据franktextnamefromis
2条回答

我一行一行地创建了一个新的数据帧


import pandas as pd

df = pd.DataFrame([['Tim', 'Tim Wagner is a teacher.'],
['Tim', 'He is from Cleveland, Ohio.'],
['Frank', 'Frank is a musician'],
['Tim ', 'He likes to travel with his family'],
['Frank', 'He is a performing artist who plays the cello.'],
['Frank', 'He performed at the Carnegie Hall last year'],
['Frank', 'It was fantastic listening to him']], columns=['NAME', 'TEXT'])

col = ""
txt = ""
arr = []
con_ind = 0
for i, row in df.iterrows():
    if col == row['NAME']:
        txt += ' ' + row['TEXT']
    else :
        if (i != 0):
            arr.append([col, txt])
        col = row['NAME']
        txt = row['TEXT']
        
if (txt != row['TEXT']):
    arr.append([col, txt])


print(pd.DataFrame(arr))

尝试:

grp = (df['Name'] != df['NAME'].shift()).cumsum().rename('group')
df.groupby(['NAME', grp], sort=False)['TEXT']\
  .agg(' '.join).reset_index().drop('group', axis=1)

输出:

    NAME                                               TEXT
0    Tim  Tim Wagner is a teacher. He is from Cleveland,...
1  Frank                                Frank is a musician
2   Tim                  He likes to travel with his family
3  Frank  He is a performing artist who plays the cello....

相关问题 更多 >

    热门问题