Python使用Beautiful Soup查找与替换

2 投票

1 回答

983 浏览

提问于 2025-04-16 21:24

我正在使用Beautiful Soup这个工具，想要在一个HTML文件里把某些特定的内容替换成带链接的格式。

我遇到了一个问题，具体情况如下：

modified_contents = re.sub("([^http://*/s]APP[a-z]{2}[0-9]{2})", "<a href=\"http://stack.com=\\1\">\\1</a>", str(soup))

示例输入1：

Input File contains APPdd34

Output File contains <a href="http://stack.com=APPdd34"> APPdd34</a>

示例输入2：

Input File contains <a href="http://stack.com=APPdd34"> APPdd34</a>

Output File contains <a href="http://stack.com=<a href="http://stack.com=APPdd34"> APPdd34</a>"> <a href="http://stack.com=APPdd34"> APPdd34</a></a>

我希望的输出文件2和示例输入文件2是一样的。

我该怎么解决这个问题呢？

数据处理网页抓取 beautiful soup 内容替换 html 解析

1 个回答

这可能不能完全解决你的问题，因为我不知道你整个输入文件的样子，但我希望这能给你一些方向。

from BeautifulSoup import BeautifulSoup, Tag
text = """APPdd34"""
soup = BeautifulSoup(text)
var1 = soup.text
text = """&lt;a href="http://stack.com=APPdd34"&gt; APPdd34&lt;/a&gt;"""
soup = BeautifulSoup(text)
var2 = soup.find('a').text

soup = BeautifulSoup("&lt;p>Some new html&lt;/p&gt;")
tag1 = Tag(soup, "a",{'href':'http://stack.com='+var1,})
tag1.insert(0,var1) # Insert text
tag2 = Tag(soup, "a",{'href':'http://stack.com='+var2,})
tag2.insert(0,var2)
soup.insert(0,tag1)
soup.insert(3,tag2)
print soup.prettify()

简单来说，你可以用BeautifulSoup来提取文本，然后再从这些文本中构建标签。

回答于 2025-04-16 由 Python大师

分享举报

Python使用Beautiful Soup查找与替换

1 个回答

撰写回答