如何防止python BeautifulSoup用十六进制代码替换转义序列？

from bs4 import BeautifulSoup #import os source_modlel_file_name="TestModel.ldm" target_model_file_name="TestModel_out.ldm" with open(source_modlel_file_name,'r',encoding="utf-8",newline="\r\n") as source_model_file: source_model = source_model_file.read() soup_model=BeautifulSoup(source_model, "xml") with open(target_model_file_name, "w",encoding="utf-8",newline="\r\n") as file: file.write(str(soup_model))

1条回答

网友

1楼 · 发布于 2024-06-10 14:04:11

一种解决方案是使用自定义格式化程序：

from bs4 import BeautifulSoup
from bs4.formatter import HTMLFormatter


class CustomAttributes(HTMLFormatter):
    def attributes(self, tag):
        for k, v in tag.attrs.items():
            v = v.replace("\r", "&#xD;")
            v = v.replace("\n", "&#xA;")
            v = v.replace("\t", "&#x9;")
            yield k, v


xml_doc = """<test>
    <data description="Some Text &#xD; &#xA; &#x9;">
        some data
    </data>
</test>"""

soup = BeautifulSoup(xml_doc, "xml")

print(soup.prettify(formatter=CustomAttributes()))

印刷品：

<?xml version="1.0" encoding="utf-8"?>
<test>
 <data description="Some Text &#xD; &#xA; &#x9;">
  some data
 </data>
</test>

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何防止python BeautifulSoup用十六进制代码替换转义序列？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >