替换字符串列表中的子字符串

2024-06-16 10:07:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在努力清理我的句子,以及我想删除句子中的这些标记(它们是下划线形式,后跟一个单词,例如“h”)。 基本上,我想删除后接下划线的字符串(同时删除下划线本身)

文本:

['hanks_NNS sir_VBP',
'Oh_UH thanks_NNS to_TO remember_VB']

所需输出:

^{pr2}$

下面是我尝试的代码:

for i in text:
    k= i.split(" ")
    print (k)
    for z in k:
        if "_" in z:
            j=z.replace("_",'')
            print (j)

电流输出:

ThanksNNS
sirVBP
OhUH
thanksNNS
toTO
rememberVB
RemindVB

Tags: 字符串in标记文本for单词形式句子
1条回答
网友
1楼 · 发布于 2024-06-16 10:07:56

使用正则表达式:

你可以用^{}来做。匹配字符串中所需的子字符串,并将子字符串替换为空字符串:

import re

text = ['hanks_NNS sir_VBP', 'Oh_UH thanks_NNS to_TO remember_VB']
curated_text = [re.sub(r'_\S*', r'', a) for a in text]
print curated_text

输出:

^{pr2}$

正则表达式:

_\S* - Underscore followed by 0 or more non space characters

不带正则表达式:

text = ['hanks_NNS sir_VBP', 'Oh_UH thanks_NNS to_TO remember_VB']
curated_text = [] # Outer container for holding strings in text.

for i in text:
    d = [] # Inner container for holding different parts of same string.
    for b in i.split():
        c = b.split('_')[0] # Discard second element after split
        d.append(c)         # Append first element to inner container.
    curated_text.append(' '.join(d)) # Join the elements of inner container.
    #Append the curated string to the outer container.

print curated_text

输出:

^{pr2}$

代码有问题:

实际上,您只是想用空字符串替换'_'及其后的字符。在

for i in text:
    k= i.split(" ")
    print (k)
    for z in k:
        if "_" in z:
            j=z.replace("_",'') # < - 'hanks_NNS' becomes 'hanksNNS'
            print (j)

相关问题 更多 >