如果字符串中包含其他列的值,则映射列

2024-05-16 00:24:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据帧,第一个是:

df1型

col_one col_two
ABBC1  (1, 2, 3)
DFFG2  (3, 5, 1)
JJKS3  (5, 2, 5)

df2型

    col_1
operate ABBC1 1 to 2, JJKS3 3 to 5
operate JJKS3, FOM

期望输出df2:

  col_1                col_2
operate ABBC1, to 2  (1, 2, 3)
operate JJKS3, FOM   (5, 2, 5)

我尝试过多种方法,但最接近的方法是:

for values, map_col in df1[['col_one', 'col_two']].values:
    for val in df2['col_1']:
        if ("%s" %values) in df2['col_1'] :
            df2['col_2'] = "%s" %(map_col,)

我认为这会很好,但我得到的所有行的值都完全相同。你知道吗

欢迎任何帮助。谢谢


Tags: to方法inmapforcolonedf1
1条回答
网友
1楼 · 发布于 2024-05-16 00:24:19

您可以按^{}创建新列并按其合并:

pat = '|'.join(r"\b{}\b".format(x) for x in df1['col_one'].unique())
df2['col_one'] = df2['col_1'].str.extract('(' + pat + ')')
print (df2)
                 col_1 col_one
0  operate ABBC1, to 2   ABBC1
1   operate JJKS3, FOM   JJKS3

df = df1.merge(df2, on='col_one')
print (df)
  col_one    col_two                col_1
0   ABBC1  (1, 2, 3)  operate ABBC1, to 2
1   JJKS3  (5, 2, 5)   operate JJKS3, FOM

编辑:

如果可以匹配多个值,请使用^{}并创建新的DataFrame

pat = '|'.join(r"\b{}\b".format(x) for x in df1['col_one'].unique())
s = df2['col_1'].str.findall('(' + pat + ')')
print (s)
0    [ABBC1, JJKS3]
1           [JJKS3]
Name: col_1, dtype: object

lens = s.str.len()
a = np.repeat(df2['col_1'], lens)
b = np.concatenate(s)
df2 = pd.DataFrame({'col_1':a, 'col_one':b})
print (df2)
                               col_1 col_one
0  operate ABBC1, to 2  JJKS3 3 to 5   ABBC1
0  operate ABBC1, to 2  JJKS3 3 to 5   JJKS3
1                 operate JJKS3, FOM   JJKS3

df = df1.merge(df2, on='col_one')
print (df)
  col_one    col_two                              col_1
0   ABBC1  (1, 2, 3)  operate ABBC1, to 2  JJKS3 3 to 5
1   JJKS3  (5, 2, 5)  operate ABBC1, to 2  JJKS3 3 to 5
2   JJKS3  (5, 2, 5)                 operate JJKS3, FOM

相关问题 更多 >