基于另一列匹配部分文本

网友

1楼 · 编辑于 2024-05-16 09:40:26

你可以用

df['text'] = df.apply(lambda x: re.sub(r'(?<!\d)(?<!\d\.)(?:{}|{})(?!\.?\d)'.format(re.escape(x['num']), '|'.join([re.escape(l) for l in x['num'].split('/')])), '', x['text']), axis=1)

多亏了df.apply和axis=1，我们迭代了所有行

正则表达式根据num列中的值动态生成，并应用于text列

r'(?<!\d)(?<!\d\.)(?:{}|{})(?!\.?\d)'.format(re.escape(x['num']), '|'.join([re.escape(l) for l in x['num'].split('/')]))创建类似正则表达式的

(?<!\d)(?<!\d\.)(?:3/4|3|4)(?!\.?\d)

分别匹配num列中的完整值和/之间的数字

(?<!\d)(?<!\d\.)是一个查找序列，如果当前位置的左侧有一个数字或一个数字+点，则匹配失败；如果当前位置的右侧有一个数字或一个数字+点，则(?!\.?\d)匹配失败，实际上不允许长数字中的数字匹配

网友

2楼 · 编辑于 2024-05-16 09:40:26

创建一个数字列表，并添加/

nums = '|'.join(df['num'].tolist()).replace('/', '|') + '|/'
nums
'3.5|60|3|4|5.0|/'

然后替换

df['text'].str.replace(nums, '')

0    test one  and  test tow
1         test one  test tow
2         test one  test tow

网友

3楼 · 编辑于 2024-05-16 09:40:26

这项工作：

import re

txt='''\
text                                    num

test one 3.5 and 60 test tow            3.5/60
test one 3/4 test tow                     3/4
test one 5.0 test tow                     5.0'''

for line in txt.splitlines():
    m=re.search(r'^(.*?[ \t]{2,}(?=\d))([0-9.\/]+)$', line)
    if m:
        a,_,b=m.group(2).partition('/')
        if re.search(fr'\b{m.group(2)}\b', m.group(1)):
            l=len(m.group(1))
            s=re.sub(fr'[ ]?\b{m.group(2)}\b', '', m.group(1))
            line=s+' '*(l-len(s))+m.group(2)
        elif re.search(fr'{a}[^/]+{b}', m.group(1)):
            l=len(m.group(1))
            s=re.sub(fr'[ ]?\b{a}\b','',m.group(1))
            s=re.sub(fr'[ ]?\b{b}\b','',s)
            line=s+' '*(l-len(s))+m.group(2)
                
    print(line)

印刷品：

text                                    num

test one and test tow                   3.5/60
test one test tow                         3/4
test one test tow                         5.0

相关问题更多 >

编程相关推荐

热门问题

热门文章

基于另一列匹配部分文本

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >