在Python etree中排序XML

12 投票
2 回答
20135 浏览
提问于 2025-04-18 17:29

我知道这个问题之前有人问过,但我在用我的例子时遇到了困难,非常希望能得到一些帮助。

我想实现的目标看起来很简单:

我有两个文件,第一个文件类似于下面的内容,第二个文件几乎一样,只不过它只有“层”(LAYER)和“测试名称”(TEST NAME),也就是说没有“主层”(MASTER)。

<MASTER>
<LAYER NAME="LAYER B">
    <TEST NAME="Soup1">
        <TITLE>Title 2</TITLE>
        <SCRIPTFILE>PAth 2</SCRIPTFILE>
        <ASSET_FILE PATH="Path 22" />
        <ARGS>
          <ARG ID="arg_21">some_Arg11</ARG>
          <ARG ID="arg_22">some_Arg12</ARG>
        </ARGS>
        <TIMEOUT OSTYPE="111">1200</TIMEOUT>
    </TEST>

    <TEST NAME="Bread2">
        <TITLE>Title 1</TITLE>
        <SCRIPTFILE>PAth 1</SCRIPTFILE>
        <ASSET_FILE PATH="Path 11" />        
        <ARGS>
          <ARG ID="arg_11">some_Arg12</ARG>
          <ARG ID="arg_12">some_Arg22</ARG>
        </ARGS>
        <TIMEOUT OSTYPE="2222">1000</TIMEOUT>
    </TEST>
</LAYER>
<LAYER NAME="LAYER A">
    <TEST NAME="Soup2">
        <TITLE>Title 2</TITLE>
        <SCRIPTFILE>PAth 2</SCRIPTFILE>
        <ASSET_FILE PATH="Path 22" />
        <ARGS>
          <ARG ID="arg_21">some_Arg11</ARG>
          <ARG ID="arg_22">some_Arg12</ARG>
        </ARGS>
        <TIMEOUT OSTYPE="111">1200</TIMEOUT>
    </TEST>

    <TEST NAME="Bread2">
        <TITLE>Title 1</TITLE>
        <SCRIPTFILE>PAth 1</SCRIPTFILE>
        <ASSET_FILE PATH="Path 11" />        
        <ARGS>
          <ARG ID="arg_11">some_Arg12</ARG>
          <ARG ID="arg_12">some_Arg22</ARG>
        </ARGS>
        <TIMEOUT OSTYPE="2222">1000</TIMEOUT>
    </TEST>
</LAYER>
</MASTER>

我想做的就是根据名称对这些文件进行排序,同时要尊重每个层的顺序。

在上面的例子中,层A应该排在层B之前,而在每个层内,它们应该按照名称的顺序排列,所以“面包”(Bread)应该排在“汤”(Soup)之前。

在我的第二个例子中,我没有这些子层。

<LAYER>
    <TEST NAME="Soup1">
        <TITLE>Title 2</TITLE>
        <SCRIPTFILE>PAth 2</SCRIPTFILE>
        <ASSET_FILE PATH="Path 22" />
        <ARGS>
          <ARG ID="arg_21">some_Arg11</ARG>
          <ARG ID="arg_22">some_Arg12</ARG>
        </ARGS>
        <TIMEOUT OSTYPE="111">1200</TIMEOUT>
    </TEST>

    <TEST NAME="Bread2">
        <TITLE>Title 1</TITLE>
        <SCRIPTFILE>PAth 1</SCRIPTFILE>
        <ASSET_FILE PATH="Path 11" />        
        <ARGS>
          <ARG ID="arg_11">some_Arg12</ARG>
          <ARG ID="arg_12">some_Arg22</ARG>
        </ARGS>
        <TIMEOUT OSTYPE="2222">1000</TIMEOUT>
    </TEST>
</LAYER>

我想要根据测试名称进行排序。

提前谢谢大家的帮助!

2 个回答

2

如果你想用递归的方式来排序,同时处理注释并对所有属性进行排序:

#!/usr/bin/env python
# encoding: utf-8

from __future__ import print_function

import logging
from lxml import etree


def get_node_key(node, attr=None):
    """Return the sorting key of an xml node
    using tag and attributes
    """
    if attr is None:
        return '%s' % node.tag + ':'.join([node.get(attr)
                                        for attr in sorted(node.attrib)])
    if attr in node.attrib:
        return '%s:%s' % (node.tag, node.get(attr))
    return '%s' % node.tag


def sort_children(node, attr=None):
    """ Sort children along tag and given attribute.
    if attr is None, sort along all attributes"""
    if not isinstance(node.tag, str):  # PYTHON 2: use basestring instead
        # not a TAG, it is comment or DATA
        # no need to sort
        return
    # sort child along attr
    node[:] = sorted(node, key=lambda child: get_node_key(child, attr))
    # and recurse
    for child in node:
        sort_children(child, attr)


def sort(unsorted_file, sorted_file, attr=None):
    """Sort unsorted xml file and save to sorted_file"""
    tree = etree.parse(unsorted_file)
    root = tree.getroot()
    sort_children(root, attr)

    sorted_unicode = etree.tostring(root,
                                    pretty_print=True,
                                    encoding='unicode')
    with open(sorted_file, 'w') as output_fp:
        output_fp.write('%s' % sorted_unicode)
        logging.info('written sorted file %s', sorted_unicode)

注意:我使用的是 lxml.etree(http://lxml.de/tutorial.html

27

使用ElementTree这个库,你可以这样做:

import xml.etree.ElementTree as ET

def sortchildrenby(parent, attr):
    parent[:] = sorted(parent, key=lambda child: child.get(attr))

tree = ET.parse('input.xml')
root = tree.getroot()

sortchildrenby(root, 'NAME')
for child in root:
    sortchildrenby(child, 'NAME')

tree.write('output.xml')

撰写回答