如何使用python正则表达式查找和替换句子中出现的第n个单词?

2024-04-29 03:36:19 发布

您现在位置:Python中文网/ 问答频道 /正文

仅使用python正则表达式,如何查找和替换句子中出现的第n个单词? 例如:

str = 'cat goose  mouse horse pig cat cow'
new_str = re.sub(r'cat', r'Bull', str)
new_str = re.sub(r'cat', r'Bull', str, 1)
new_str = re.sub(r'cat', r'Bull', str, 2)

我有一个句子上面的单词'猫'出现在句子中两次。我想把第二次出现的“猫”改成“牛”,不去碰第一个“猫”字。我的最后一句话是: “猫鹅鼠马猪牛”。在我上面的代码中,我试了3次都没有得到我想要的。


Tags: 代码renew单词cat句子pigstr
3条回答

我使用simple函数,它列出所有出现的情况,选择第n个出现的位置,并使用它将原始字符串拆分为两个子字符串。然后它替换第二个子字符串中的第一个匹配项,并将子字符串连接回新字符串:

import re

def replacenth(string, sub, wanted, n)
    where = [m.start() for m in re.finditer(sub, string)][n-1]
    before = string[:where]
    after = string[where:]
    after.replace(sub, wanted, 1)
    newString = before + after
    print newString

对于这些变量:

string = 'ababababababababab'
sub = 'ab'
wanted = 'CD'
n = 5

输出:

ababababCDabababab

注:

The where variable actually is a list of matches' positions, where you pick up the nth one. But list item index starts with 0 usually, not with 1. Therefore there is a n-1 index and n variable is the actual nth substring. My example finds 5th string. If you use n index and want to find 5th position, you'll need n to be 4. Which you use usually depends on the function, which generates our n.

This should be the simplest way, but it isn't regex only as you originally wanted.

Sources and some links in addition:

使用下面这样的负展望。

>>> s = "cat goose  mouse horse pig cat cow"
>>> re.sub(r'^((?:(?!cat).)*cat(?:(?!cat).)*)cat', r'\1Bull', s)
'cat goose  mouse horse pig Bull cow'

DEMO

  • ^断言我们已经开始了。
  • (?:(?!cat).)*匹配任何字符,但不匹配cat,零次或多次。
  • cat匹配第一个cat子字符串。
  • (?:(?!cat).)*匹配任何字符,但不匹配cat,零次或多次。
  • 现在,将所有模式封装在一个捕获组(如((?:(?!cat).)*cat(?:(?!cat).)*))中,以便我们以后可以引用这些捕获的字符。
  • cat下面的第二个cat字符串匹配。

>>> s = "cat goose  mouse horse pig cat cow"
>>> re.sub(r'^(.*?(cat.*?){1})cat', r'\1Bull', s)
'cat goose  mouse horse pig Bull cow'

更改{}中的数字,以替换字符串cat的第一个或第二个或第n个匹配项

要替换第三个出现的字符串cat,请将2放在大括号中。。

>>> re.sub(r'^(.*?(cat.*?){2})cat', r'\1Bull', "cat goose  mouse horse pig cat foo cat cow")
'cat goose  mouse horse pig cat foo Bull cow'

Play with the above regex on here ...

这里有一种不使用regex的方法:

def replaceNth(s, source, target, n):
    inds = [i for i in range(len(s) - len(source)+1) if s[i:i+len(source)]==source]
    if len(inds) < n:
        return  # or maybe raise an error
    s = list(s)  # can't assign to string slices. So, let's listify
    s[inds[n-1]:inds[n-1]+len(source)] = target  # do n-1 because we start from the first occurrence of the string, not the 0-th
    return ''.join(s)

用法:

In [278]: s
Out[278]: 'cat goose  mouse horse pig cat cow'

In [279]: replaceNth(s, 'cat', 'Bull', 2)
Out[279]: 'cat goose  mouse horse pig Bull cow'

In [280]: print(replaceNth(s, 'cat', 'Bull', 3))
None

相关问题 更多 >