两列唯一字符串

2024-04-25 05:13:50 发布

您现在位置:Python中文网/ 问答频道 /正文

我想找到person1person2列的唯一组合,尽管数据帧中的值是相反的。下面您可以找到初始数据帧示例,我想在其中找到唯一的人:

df = pd.DataFrame({"person1":["AL","IN","AN","DL","IN","AL","AL","IN","AN"],
                   "person2":["AL","AN", np.nan,"AL","AN","AL","DL","IN","IN"]})

  person1  person2
0     AL      AL
1     IN      AN
2     AN      NAN
3     DL      AL
4     IN      AN
5     AL      AL
6     AL      DL
7     IN      IN
8     AN      IN

我期望的输出如下所示:

  person1  person2  person
0     AL      AL     AL
1     IN      AN    IN/AN
2     AN      NAN    AN
3     DL      AL    DL/AL
4     IN      AN    IN/AN
5     AL      AL     AL
6     AL      DL    DL/AL  # Since it has been added as DL/AL NOT AL/DL
7     IN      IN     IN
8     AN      IN    IN/AN  # Since it has been added as IN/AN NOT AN/IN

我用了这个代码:

df['person'] = np.where(df.person1 != df.person2,
                                     df.person1 + "/" + df.person2, df.person1)

但在我上面的例子中,它在索引6和索引8中返回AL/DLAN/IN。一如既往,当我没有找到合适的方法时,我可以得到DL/ALIN/AN的唯一顺序

熊猫大师,请指路:)


Tags: 数据inandfnpitnanperson
2条回答

您可以使用方法apply()

df['person']=df.apply(lambda r: r.drop_duplicates().sort_values().str.cat(sep='/'), axis=1)

print(df)

输出:

  person1 person2 person
0      AL      AL     AL
1      IN      AN  AN/IN
2      AN     NaN     AN
3      DL      AL  AL/DL
4      IN      AN  AN/IN
5      AL      AL     AL
6      AL      DL  AL/DL
7      IN      IN     IN
8      AN      IN  AN/IN

如果可能,对两列进行排序:

df1 = pd.DataFrame(np.sort(df[['person1','person2']].fillna('')), 
                   index=df.index,
                   columns=['person1','person2'])
df['person'] = np.where(df1.person1 != df1.person2,
                        df1.person1.str.cat(df1.person2,  sep="/").str.strip('/'),
                        df1.person1)
print (df)
  person1 person2 person
0      AL      AL     AL
1      IN      AN  AN/IN
2      AN     NaN     AN
3      DL      AL  AL/DL
4      IN      AN  AN/IN
5      AL      AL     AL
6      AL      DL  AL/DL
7      IN      IN     IN
8      AN      IN  AN/IN

相关问题 更多 >