有条件地合并数据帧的连续行

NAME TEXT Tim Tim Wagner is a teacher. Tim He is from Cleveland, Ohio. Frank Frank is a musician. Tim He like to travel with his family Frank He is a performing artist who plays the cello. Frank He performed at the Carnegie Hall last year. Frank It was fantastic listening to him.

NAME TEXT Tim Tim Wagner is a teacher. He is from Cleveland, Ohio. Frank Frank is a musician Tim He like to travel with his family Frank He is a performing artist who plays the cello. He performed at the Carnegie Hall last year. It was fantastic listening to him.

2条回答

网友

1楼 · 编辑于 2024-05-29 05:23:36

我一行一行地创建了一个新的数据帧


import pandas as pd

df = pd.DataFrame([['Tim', 'Tim Wagner is a teacher.'],
['Tim', 'He is from Cleveland, Ohio.'],
['Frank', 'Frank is a musician'],
['Tim ', 'He likes to travel with his family'],
['Frank', 'He is a performing artist who plays the cello.'],
['Frank', 'He performed at the Carnegie Hall last year'],
['Frank', 'It was fantastic listening to him']], columns=['NAME', 'TEXT'])

col = ""
txt = ""
arr = []
con_ind = 0
for i, row in df.iterrows():
    if col == row['NAME']:
        txt += ' ' + row['TEXT']
    else :
        if (i != 0):
            arr.append([col, txt])
        col = row['NAME']
        txt = row['TEXT']
        
if (txt != row['TEXT']):
    arr.append([col, txt])


print(pd.DataFrame(arr))

网友

2楼 · 编辑于 2024-05-29 05:23:36

尝试：

grp = (df['Name'] != df['NAME'].shift()).cumsum().rename('group')
df.groupby(['NAME', grp], sort=False)['TEXT']\
  .agg(' '.join).reset_index().drop('group', axis=1)

输出：

    NAME                                               TEXT
0    Tim  Tim Wagner is a teacher. He is from Cleveland,...
1  Frank                                Frank is a musician
2   Tim                  He likes to travel with his family
3  Frank  He is a performing artist who plays the cello....

相关问题更多 >

编程相关推荐

热门问题

热门文章