如何使用Python编写包含html标记的XML文件？

<?xml version="1.0" encoding="UTF-8" standalone="yes"> <Root> ... <Row> <Entry_No>657</Entry_No> <Waterfall_Name>Detian Waterfall (德天瀑布 [Détiān Pùbù])</Waterfall_Name> <File_directory>./waterfall_writeups/657_Detian_Waterfall/</File_directory> <Introduction>introduction-detian-waterfall.html</Introduction> </Row> ... </Root>

<?xml version="1.0" encoding="UTF-8" standalone="yes"> <Root> ... <Row> <Entry_No>657</Entry_No> <Waterfall_Name>Detian Waterfall (德天瀑布 [Détiān Pùbù])</Waterfall_Name> <File_directory>./waterfall_writeups/657_Detian_Waterfall/</File_directory> <Introduction>introduction-detian-waterfall.html</Introduction> <Introduction_Body><![CDATA[Stuff parsed in from file './waterfall_writeups/657_Detian_Waterfall/introduction-detian-waterfall.html' as is, which includes html tags like <a href="http://blah.com/blah.html"></a>, <br>, <img src="http://blahimg.jpg">, etc. It should also preserve carriage returns and characters like 德天瀑布 [Détiān Pùbù]...]]> </Introduction_Body> </Row> ... </Root>

try: import xml.etree.cElementTree as ET except ImportError: import xml.etree.ElementTree as ET import os data_file = 'test3_of_2016-09-19.xml' tree = ET.ElementTree(file=data_file) root = tree.getroot() for element in root: if element.find('File_directory') is not None: directory = element.find('File_directory').text if element.find('Introduction') is not None: introduction = element.find('Introduction').text intro_tree = directory+introduction with open(intro_tree, 'r') as f: #note this with statement eliminates need for f.close() intro_text = f.read() intro_body = ET.SubElement(element,'Introduction_Body') intro_body.text = '<![CDATA[' + intro_text + ']]>' #tree.write('new_' + data_file) #same result but leaves out the xml header f = open('new_' + data_file, 'w') f.write('<?xml version="1.0" encoding="UTF-8" standalone="yes">' + ET.tostring(root)) f.close()

1条回答

网友

1楼 · 发布于 2024-05-16 07:43:27

我建议您切换到^{}。它有很好的文档并且（几乎）完全兼容python自己的xml。您可能只需要对代码进行最小程度的更改。lxml非常方便地支持CDATA：

> from lxml import etree
> elmnt = etree.Element('root')
> elmnt.text = etree.CDATA('abcd')
> etree.dump(elmnt)

<root><![CDATA[abcd]]></root>

除此之外，您绝对应该使用任何库，不仅用于解析xml，而且还用于编写xml！lxml将为您声明：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章