从列中提取值

2024-06-16 11:07:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个列,其中包含几个用连字符分隔的数据。比如说,

column A
TTT-Changing Car-BBBB-KKKK
TTT-KKKK - Changing device-KKKK
Releasing device-RRRR-KKKK-TTTT
RRRR-BBBB-Switching Car-TTTT
Login issue -RRRR-KKKK-TTTT
CCCC-Activation issue-RRRR-KKKK-TTTT

我得到了一个单词列表,我想从a列查找到B列。举个例子,如果a列包含“更改”或“更改”或“a更改”,它将在B列中返回“更改”,如果它包含“激活”或“注册”,则在B列中返回“激活”,等等

我正在寻找类似于[if(isnumber)(search(excel中的公式))但可以在python中使用的东西

谢谢


Tags: 数据devicelogincolumnissue字符carttt
2条回答

如果我理解正确,下面应该可以:

数据帧:

df
                               column A
0            TTT-Changing Car-BBBB-KKKK
1       TTT-KKKK - Changing device-KKKK
2       Releasing device-RRRR-KKKK-TTTT
3          RRRR-BBBB-Switching Car-TTTT
4           Login issue -RRRR-KKKK-TTTT
5  CCCC-Activation issue-RRRR-KKKK-TTTT

使用str.extract作为Activation&Changing字符串

df['column B'] = df['column A'].str.extract('(Activation|Changing[^-]*)')

                                   column A     column B
0            TTT-Changing Car-BBBB-KKKK     Changing Car
1       TTT-KKKK - Changing device-KKKK  Changing device
2       Releasing device-RRRR-KKKK-TTTT              NaN
3          RRRR-BBBB-Switching Car-TTTT              NaN
4           Login issue -RRRR-KKKK-TTTT              NaN
5  CCCC-Activation issue-RRRR-KKKK-TTTT       Activation

现在根据需要替换新列iecolB中的单词

df['column B']  = df['column B'].str.replace(r'(^.*Changing.*$)', 'Change')
df['column B']  = df['column B'].str.replace(r'(^.*Activation.*$)', 'Activation')

df
                               column A      column B
0            TTT-Changing Car-BBBB-KKKK      Change
1       TTT-KKKK - Changing device-KKKK      Change
2       Releasing device-RRRR-KKKK-TTTT         NaN
3          RRRR-BBBB-Switching Car-TTTT         NaN
4           Login issue -RRRR-KKKK-TTTT         NaN
5  CCCC-Activation issue-RRRR-KKKK-TTTT  Activation

另一种方法是:

下面更好的方法是,您可以安排要重命名的项目数量,然后应用于数据帧,如下所示:

df = pd.read_csv("data_file")
df['column B'] = df['column A'].str.extract('(Activation|Changing[^-]*)')

replacements = {
   'column B': {
      r'(^.*Changing.*$)': 'Change',
      r'(^.*Activation.*$)': 'Activation'}
}

df = df.replace(replacements, regex=True)
print(df)

结果:

                               column A    column B
0            TTT-Changing Car-BBBB-KKKK      Change
1       TTT-KKKK - Changing device-KKKK      Change
2       Releasing device-RRRR-KKKK-TTTT         NaN
3          RRRR-BBBB-Switching Car-TTTT         NaN
4           Login issue -RRRR-KKKK-TTTT         NaN
5  CCCC-Activation issue-RRRR-KKKK-TTTT  Activation

这里我们没有在replacement中定义列名,因此您需要定义df['column B'] =

df['column B'] = df['column A'].str.extract('(Activation|Changing[^-]*)')
replacements = {
      r'(^.*Changing.*$)': 'Change',
      r'(^.*Activation.*$)': 'Activation'
}
print(replacements)
df['column B'] = df['column B'].replace(replacements, regex=True)
print(df)

注:

replacement相对较慢,而按列操作则足够快

您可以使用extract函数:

df['column B'] = df['column A'].str.extract('(Changing[^-]*)')

df
                               column A         column B
0            TTT-Changing Car-BBBB-KKKK     Changing Car
1       TTT-KKKK - Changing device-KKKK  Changing device
2       Releasing device-RRRR-KKKK-TTTT              NaN
3          RRRR-BBBB-Switching Car-TTTT              NaN
4           Login issue -RRRR-KKKK-TTTT              NaN
5  CCCC-Activation issue-RRRR-KKKK-TTTT              NaN

编辑

如果要替换内容,请考虑使用字典:

dct = {'changing': 'Change',
       'change':'Change',
       'activation':'Activation',
       'registration':'Activation'}

pat = f"(?i).*\\b({'|'.join(dct.keys())})\\b.*"

df['column A'].str.replace(pat, lambda x: dct.get(x.group(1).lower(), None))
0                             Change
1                             Change
2    Releasing device-RRRR-KKKK-TTTT
3       RRRR-BBBB-Switching Car-TTTT
4        Login issue -RRRR-KKKK-TTTT
5                         Activation
Name: column A, dtype: object

相关问题 更多 >