从每行的框架中提取唯一值，并将其添加到新列中

match 0 1 2 3 4 5 6 7 1 Morocco France Morocco NaN NaN NaN NaN NaN 2 Morocco France Morocco NaN NaN NaN NaN NaN 3 Morocco France NaN NaN NaN NaN NaN NaN 4 China United States NaN NaN NaN NaN NaN NaN 5 China NaN NaN NaN NaN NaN NaN NaN

3条回答

网友

1楼 · 编辑于 2024-05-23 22:55:31

下面是在lambda中组合set和list的尝试：

df_ex[8] = [x for x in df_ex[[0,1,2,3,4,5,6,7]].values.tolist()]
df_ex[8] = df_ex[8].apply(lambda x: list(set([y for y in x if str(y)!='nan'])))

输出：

0         [Morocco, France]
1         [Morocco, France]
2         [Morocco, France]
3    [United States, China]
4                   [China]

网友

2楼 · 编辑于 2024-05-23 22:55:31

# Convert each column dtype to str: x.astype(str)
# Null dtype became 'nan' so remove it: replace('nan', "")
# Concatenate each row entry: sum()
# Convert it to set to delete duplicate entries 
# Convert it to list to concatenate with "," as a string

df_new = df.apply(lambda x: ",".join(list(set(((x.astype(str)).sum()).replace('nan', "")))), axis=1)

网友

3楼 · 编辑于 2024-05-23 22:55:31

使用：

cols = df.columns[df.columns.str.isnumeric()]
#or selecting columns
#cols = df.columns[1:]
#cols = df.columns.difference(['match'])
df[int(cols[-1])+1]=df[cols].agg(lambda x: ', '.join(set(x.dropna())),axis=1)
#for string type
#df[f'{int(cols[-1])+1}']=df[cols].stack().groupby(level=0).agg(', '.join)
df = df.reindex(columns = df.columns.difference(cols))
print(df)

                      8  match
0       France, Morocco      1
1       France, Morocco      2
2       France, Morocco      3
3  China, United_States      4
4                 China      5

我们还可以使用：

df[int(cols[-1])+1] = (df[cols].stack()
                               .groupby(level=0)
                               .agg(lambda x: ', '.join(set(x)),axis=1))

相关问题更多 >

编程相关推荐

热门问题

热门文章

从每行的框架中提取唯一值，并将其添加到新列中

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >