python etree用html实体解析xml(保持html格式)

2024-05-13 04:23:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下xml:

<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:app="http://www.w3.org/2007/app" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:metadata="http://xmlns.escenic.com/2010/atom-metadata">
 <content type="application/vnd.vizrt.payload+xml">
    <vdf:payload xmlns:vdf="http://www.vizrt.com/types">
      <vdf:field name="body">
        <vdf:value>

          <div xmlns="http://www.w3.org/1999/xhtml">
            <p>I saluti dal Sud partono con <strong>Elsa Albonico</strong>, storica  "golosit&#xE0;", con i pi&#xF9; piccoli "fare le conte".</p>
            <p>I saluti dal Nord la <a href="http://www.proticino.ch/sezioni-in-svizzera/basilea/">Pro Ticino di Basilea</a> con un particolarit&#xE0; frammenti&#xA0;&#xA0; </p>
            <p><a href="https://www.rts.ch/">RTS</a> "Kiosque &#xE0; Musiques" con <strong>Jean-Marc Richard</strong>. <br/>A fare da<em> fil&#xA0;rouge</em> al nostro </p>
            <p>
              <a href="http://internal.publishing.production.rsi.ch/webservice/escenic/content/8762014" id="_360b1131-e6a5-49b6-995e-a624c888617a">Le foto del gioco, Finestra popolare 26.02.2017</a>
            </p>
          </div>

        </vdf:value>
      </vdf:field>
    </vdf:payload>
  </content>
 </entry>

“body”字段是我必须以HTML格式复制到另一个文件中的HTML(因此不允许替换或其他技巧)

我使用python和eTree。在

有办法吗?在

我已经试过用尾部代替文本,但是我失去了格式 这是个大问题。在

请帮忙。在

谢谢

cp公司


Tags: orghttpwwwxmlcontentchconstrong
1条回答
网友
1楼 · 发布于 2024-05-13 04:23:02

这是一个非常难看的解决方案,但有效!作为家庭工作,做得更好!在

import xml.etree.ElementTree as ET

data = '''<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:app="http://www.w3.org/2007/app" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:metadata="http://xmlns.escenic.com/2010/atom-metadata">
 <content type="application/vnd.vizrt.payload+xml">
    <vdf:payload xmlns:vdf="http://www.vizrt.com/types">
      <vdf:field name="body">
        <vdf:value>

          <div xmlns="http://www.w3.org/1999/xhtml">
            <p>I saluti dal Sud partono con <strong>Elsa Albonico</strong>, storica  "golosit&#xE0;", con i pi&#xF9; piccoli "fare le conte".</p>
            <p>I saluti dal Nord la <a href="http://www.proticino.ch/sezioni-in-svizzera/basilea/">Pro Ticino di Basilea</a> con un particolarit&#xE0; frammenti&#xA0;&#xA0; </p>
            <p><a href="https://www.rts.ch/">RTS</a> "Kiosque &#xE0; Musiques" con <strong>Jean-Marc Richard</strong>. <br/>A fare da<em> fil&#xA0;rouge</em> al nostro </p>
            <p>
              <a href="http://internal.publishing.production.rsi.ch/webservice/escenic/content/8762014" id="_360b1131-e6a5-49b6-995e-a624c888617a">Le foto del gioco, Finestra popolare 26.02.2017</a>
            </p>
          </div>

        </vdf:value>
      </vdf:field>
    </vdf:payload>
  </content>
 </entry>'''

tree = ET.fromstring(data)
div = tree.getchildren()[0].getchildren()[0].getchildren()[0].getchildren()[0].getchildren()[0]

with open('./result.html', 'w') as html:
    html.writelines([i for i in div.itertext()])

相关问题 更多 >