如何获取并替换特定标签之间的文本

1 投票

3 回答

1510 浏览

提问于 2025-04-18 10:55

假设你有一个这样的字符串：

"<p> >this line starts with an arrow <br /> this line does not </p>"

或者是这样的：

"<p> >this line starts with an arrow </p> <p> this line does not </p>"

我该怎么做才能找到那些以箭头开头的行，并把它们用一个

包起来呢？

这样就变成了：

"<p> <div> >this line starts with an arrow </div> <br /> this line does not </p>

文本处理字符串操作 html解析标签替换

3 个回答

你可以试试这个正则表达式，

>(\w[^<]*)

示例

用Python写的代码如下，

>>> import re
>>> str = '"<p> >this line starts with an arrow <br /> this line does not </p>"'
>>> m = re.sub(r'>(\w[^<]*)', r"<div> >\1</div> ", str)
>>> m
'"<p> <div> >this line starts with an arrow </div> <br /> this line does not </p>"'

回答于 2025-04-18 由 Python大师

分享举报

你可以试试这个正则表达式：>\s+(>.*?)<。

import re
regex = re.compile("\\>\\s{1,}(\\>.{0,}?)\\<")
testString = "" # fill this in
matchArray = regex.findall(testString)
# the matchArray variable contains the list of matches

然后把匹配到的部分替换成 <div> matched_group </div>。这个模式是用来查找被 > > 和 < 包裹的内容。

这里有一个在 debuggex 上的演示。

回答于 2025-04-18 由 Python大师

分享举报

因为你要处理的是HTML，所以最好使用专门的工具——HTML解析器，比如BeautifulSoup。

可以使用find_all()这个方法，来找到所有以>开头的文本节点，然后用wrap()把它们包裹在一个新的div标签里：

from bs4 import BeautifulSoup

data = "<p> >this line starts with an arrow <br /> this line does not </p>"

soup = BeautifulSoup(data)
for item in soup.find_all(text=lambda x: x.strip().startswith('>')):
    item.wrap(soup.new_tag('div'))

print soup.prettify()

输出结果是：

<p>
    <div>
    >this line starts with an arrow
    </div>
    <br/>
    this line does not
</p>

回答于 2025-04-18 由 Python大师

分享举报

如何获取并替换特定标签之间的文本

3 个回答

撰写回答