貌似独立的替换顺序改变了结果。为什么?

2024-04-19 23:43:03 发布

您现在位置:Python中文网/ 问答频道 /正文

from bs4 import BeautifulSoup

REPLACEMENTS = [('u', '<span class="underline">{}</span>'),
                ('b', '<strong>{}</strong>'),
                ('i', '<em>{}</em>')]

def replace_tags(html, replacements=REPLACEMENTS):
    soup = BeautifulSoup(html, 'html.parser')
    for tag, template in replacements:
        for node in soup.find_all(tag):
            replacement = template.format(node.text)
            r = BeautifulSoup(replacement, 'html.parser')
            node.replace_with(r)
    return str(soup)

if __name__ == "__main__":
    my_html = """<html><body><p><b>I am strong</b> and 
    <i>I am emphasized</i> and <u>I am underlined</u>.</p></body></html>"""

    revised = replace_tags(my_html, REPLACEMENTS)
    print(revised)

这不会替换<i>标记。你知道吗

<html><body><p><strong>I am strong</strong> and 
<i>I am emphasized</i> and <span class="underline">I am underlined</span>.</p></body></html>

但是如果我把REPLACEMENTS中元组的顺序改为

REPLACEMENTS = [('b', '<strong>{}</strong>'),
                ('i', '<em>{}</em>'),
                ('u', '<span class="underline">{}</span>')]

然后<i><u>标记不被替换:

<html><body><p><strong>I am strong</strong> and 
<i>I am emphasized</i> and <u>I am underlined</u>.</p></body></html>

重新排序。。。你知道吗

REPLACEMENTS = [('i', '<em>{}</em>'),
                ('b', '<strong>{}</strong>'),
                ('u', '<span class="underline">{}</span>')]

现在输出是

<html><body><p><strong>I am strong</strong> and 
<em>I am emphasized</em> and <u>I am underlined</u>.</p></body></html>

没有进行<u>替换。你知道吗

我不明白为什么订单对输出有这种影响。这些不是嵌套的。每个通行证似乎都是一个独立的替代品。我被难住了。有什么想法吗?你知道吗


Tags: andhtmlbodyamreplaceclassstrongspan
1条回答
网友
1楼 · 发布于 2024-04-19 23:43:03

问题是为什么会发生这种情况,问题的答案是您正在用()替换\u并提供一个字符串。一根弦不是一根弦,明白吗 https://www.crummy.com/software/BeautifulSoup/bs4/doc/#navigablestring 因此,您不能再在替换的零件上使用BeautifulSoup导航。如果用一个新的标记替换它们,那么下面的操作在所有情况下都是有效的。你知道吗

from bs4 import BeautifulSoup

def replace_tags(html, replacements):
    soup = BeautifulSoup(html, 'html.parser')
    for tag in replacements:
        for node in soup.find_all(tag):
            if tag == 'i':
                newtag = soup.new_tag("em")
            if tag == 'b':
                newtag = soup.new_tag("strong")
            if tag == 'u':
                newtag = soup.new_tag("span", **{'class':'underline'})
            newtag.string = node.string
            node.replace_with(newtag)
    return str(soup)

if __name__ == "__main__":
    my_html = """<html><body><p><b>I am strong</b> <b>I am strong too</b> and 
    <i>I am emphasized</i> and <u>I am underlined</u>.</p></body></html>"""
    replacements = ['i','b','u']
    revised = replace_tags(my_html, replacements)
    print(revised)
    replacements = ['b','u','i']
    revised = replace_tags(my_html, replacements)
    print(revised)
    replacements = ['u','i','b']
    revised = replace_tags(my_html, replacements)
    print(revised)

已更新 我忽略了台词:

r = BeautifulSoup(replacement, 'html.parser')

但我不认为你可以从另一个汤中添加一个标签,并以同样的理由导航它。我读过的所有文档都涉及到从原来的soup创建一个新标记并使用它。你知道吗

相关问题 更多 >