将每个XML节点提取到单独的文本文件中

0 投票

2 回答

602 浏览

提问于 2025-04-19 14:30

我有一个这样的xml文件：

<root>
    <article>
        <article_taxonomy></article_taxonomy>
        <article_place>Somewhere</article_place>
        <article_number>1</article_number>
        <article_date>2001</article_date>
        <article_body>Blah blah balh</article_body>
    </article>

    <article>
        <article_taxonomy></article_taxonomy>
        <article_place>Somewhere</article_place>
        <article_number>2</article_number>
        <article_date>2001</article_date>
        <article_body>Blah blah balh</article_body>
    </article>

    ...
    ...
    more nodes

</root>

我想做的是把每一个节点（从<article>到</article>标签之间的内容）提取出来，并写入一个单独的txt或xml文件。我还想保留这些标签。

有没有办法做到这一点，而不使用正则表达式？有没有什么建议？

数据处理文本文件生成标签保留 xml节点提取

2 个回答

试试这样做：

from xml.dom import minidom
xmlfile = minidom.parse('yourfile.xml')
#for example for 'article_body'
article_body = xmlfile.getElementsByTagName('article_body')

或者

import xml.etree.ElementTree as ET
xmlfile = ET.parse('yourfile.xml')
root_tag = xmlfile.getroot()
for each_article in root_tag.findall('article'):
   article_taxonomy = each_article.find('article_taxonomy').text
   article_place = each_article.find('article_place').text
   # etc etc

回答于 2025-04-19 由 Python大师

分享举报

这里有一种方法可以使用 ElementTree 来实现：

import xml.etree.ElementTree as ElementTree

def main():
    with open('data.xml') as f:
        et = ElementTree.parse(f)
        for article in et.findall('article'):
            xml_string = ElementTree.tostring(article)
            # Now you can write xml_string to a new file
            # Take care to name the files sequentially

if __name__ == '__main__':
    main()

回答于 2025-04-19 由 Python大师

分享举报

将每个XML节点提取到单独的文本文件中

2 个回答

撰写回答