在数据帧列中的字符串“引号”后查找数字

2024-04-19 15:39:42 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个客户服务呼叫记录在excel表中。下面是我的数据格式

So#   Comments
1   sjhsh QUOTE 234566
1   sdsds customer call QUote 239876 Call back
2   adsdfh unknown call from customer QUOTE 189067 sdkjsd woieweio 
3   QUOTE 657894 customer called for service

我正在从excel中读取这些数据,需要在每行的文本“QUOTE”后获得6位数字,然后将提取的数字添加为新列

1.The rows might have multiple "QUOTE" mentions 2.The rows might not have "QUOTE"at all

有人能帮我用python进行子字符串搜索吗

import pandas as pd
import re
file=pd.read_excel("C:/Users/rkatta/Desktop/Book1.xlsx")
file.set_index('Index', inplace=True, drop=True)
comments=file['InternalComments']
quotenum=[]

keyword= 'QUOTE'
for i in comments:
    try:
        befor_keyowrd, keyword, after_keyword = comments[i].partition(keyword)
        num=after_keyword[:6]
        quotenum.append(num)
    except AttributeError:
        befor_keyowrd, keyword, after_keyword =''
        quotenum.append(after_keyword)

Tags: theforhave数字customercallexcelcomments
2条回答

您需要用以下行替换列操作部分:

file['InternalComments'] = file['Comments'].str.findall(r'(?i)quote\s+(\d+)').apply(','.join)

参见regex demo。你知道吗

正则表达式匹配:

  • (?i)-不区分大小写模式
  • quote-a quote子串
  • \s*-0+空格
  • (\d+)-捕获组1(由findall返回的内容):1+个数字。你知道吗

请参见Python代码演示:

from pandas import DataFrame
import pandas as pd
l = ['sjhsh QUOTE 234566', 'sdsds customer call QUote 239876 Call back', 'adsdfh unknown call from customer QUOTE 189067 sdkjsd woieweio', 'QUOTE 657894 customer called for service', 'QUOTE 657894 customer called for service QUOTE 657894 customer called for service', 'No qte']
file = pd.DataFrame(l, columns=['Comments'])
file['InternalComments'] = file['Comments'].str.findall(r'(?i)quote\s*(\d+)').apply(','.join)
file
                                            Comments InternalComments
0                                 sjhsh QUOTE 234566           234566
1         sdsds customer call QUote 239876 Call back           239876
2  adsdfh unknown call from customer QUOTE 189067...           189067
3           QUOTE 657894 customer called for service           657894
4  QUOTE 657894 customer called for service QUOTE...    657894,657894
5                                             No qte                 

(?i)(?<=QUOTE )\d+将捕获您要查找的数字。你知道吗

(?i)表示模式的其余部分不区分大小写,因此它将匹配“QUote”和单词的任何变体。你知道吗

(?<=QUOTE )表示数字前面会有引号和空格

\d+是你的号码

Demo

相关问题 更多 >