貌似独立的替换顺序改变了结果。为什么？

from bs4 import BeautifulSoup REPLACEMENTS = [('u', '{}'), ('b', '{}'), ('i', '{}')] def replace_tags(html, replacements=REPLACEMENTS): soup = BeautifulSoup(html, 'html.parser') for tag, template in replacements: for node in soup.find_all(tag): replacement = template.format(node.text) r = BeautifulSoup(replacement, 'html.parser') node.replace_with(r) return str(soup) if __name__ == "__main__": my_html = """<html><body>I am strong and I am emphasized and I am underlined.</body></html>""" revised = replace_tags(my_html, REPLACEMENTS) print(revised)

1条回答

网友

1楼 · 发布于 2024-04-19 23:43:03

问题是为什么会发生这种情况，问题的答案是您正在用（）替换\u并提供一个字符串。一根弦不是一根弦，明白吗 https://www.crummy.com/software/BeautifulSoup/bs4/doc/#navigablestring 因此，您不能再在替换的零件上使用BeautifulSoup导航。如果用一个新的标记替换它们，那么下面的操作在所有情况下都是有效的。你知道吗

from bs4 import BeautifulSoup

def replace_tags(html, replacements):
    soup = BeautifulSoup(html, 'html.parser')
    for tag in replacements:
        for node in soup.find_all(tag):
            if tag == 'i':
                newtag = soup.new_tag("em")
            if tag == 'b':
                newtag = soup.new_tag("strong")
            if tag == 'u':
                newtag = soup.new_tag("span", **{'class':'underline'})
            newtag.string = node.string
            node.replace_with(newtag)
    return str(soup)

if __name__ == "__main__":
    my_html = """<html><body><p><b>I am strong</b> <b>I am strong too</b> and 
    <i>I am emphasized</i> and <u>I am underlined</u>.</p></body></html>"""
    replacements = ['i','b','u']
    revised = replace_tags(my_html, replacements)
    print(revised)
    replacements = ['b','u','i']
    revised = replace_tags(my_html, replacements)
    print(revised)
    replacements = ['u','i','b']
    revised = replace_tags(my_html, replacements)
    print(revised)

已更新我忽略了台词：

r = BeautifulSoup(replacement, 'html.parser')

但我不认为你可以从另一个汤中添加一个标签，并以同样的理由导航它。我读过的所有文档都涉及到从原来的soup创建一个新标记并使用它。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章