在python中解析多个子元素

1条回答

网友

1楼 · 发布于 2024-04-26 00:12:45

解析XML数据的主要方法有：

DOM解析器。在
它们将完整的xml文件加载到内存中并构建DOM（文档对象模型）。它允许程序员使用许多优秀的技术在文档中导航或从中检索数据（例如XPath，xslt transformations，xml-schema to class transformation）。这种技术的缺点是，它可能需要大量内存，而且可能很慢（取决于解析器、dom模型、dom中的索引……）。

在示例中，为了简单起见，我从sotransitem和customfields中删除了一些字段。在

示例：

类别定义：

 class Sotransitem:

    recordno = None
    unit = None
    customfields = None

    def __init__( self ):
        self.recordno
        self.unit
        self.customfields = {}

    def __repr__( self ):
        return "Item( rec_no: {rec}, fields: {fields} )".format( rec=self.recordno,
                                                                 fields = str( self.customfields ) )

这里我将使用standart python库，但您也应该看看其他库。据我所知，最受欢迎的是lxml，beauthoulsoup。在

实际分析器：

^{pr2}$

它满足了我的大部分需求，但是使用xmlschema会更简单。 check lxml "assering schema" example

SAX解析器。通过小部分读取xml，当find标记（起始或结束标记）时，它将用found标记及其数据（如果它是close标记）触发一个事件。SAX解析器通常会在报告后丢弃几乎所有的信息（但是，它会保留一些内容，例如，所有尚未关闭的元素的列表）。在
优点：SAX解析器需要恒定数量的RAM，远远少于DOM。在
缺点：不可能使用大多数XML技术。

示例：

all_items = []

# get the root element
nodes_parser = ET.iterparse( 'test.xml', ["start", "end"] )
event, root = next( nodes_parser )

item = None

for event, node in nodes_parser:
    if( event=="start" and node.tag == "sotransitem" ):
        if item is not None:
            all_items.append( item )
        item = Sotransitem()
        sotrans_node = node;

    elif event == "end":
        tag = node.tag
        if tag == "recordno":
            item.recordno = int( node.text )
        elif  tag == "unit":
            item.unit = node.text

        elif tag == 'customfield':
            value = node.find('customfieldvalue').text
            name = node.find('customfieldname').text
            item.customfields[ name ] = value

        sotrans_node.clear() #other wise it will be ceeped in "node" until "end" event on "sotransitem"
    else:
        sotrans_node.clear()
    root.clear() # same as before but for root 

if item is not None:
    all_items.append( item )

print( all_items )
#same resutl as before

选择哪种方式取决于XML文件中存储的数据量。在

如果它只是一个简单的脚本（写一次很快就可以了），那么从小文件中检索一些数据只需使用DOM。在

如果是配置文件或服务器之间的小消息（几兆字节长）：带自动xml到类转换的DOM可能是最好的。在

如果您的数据太大而无法保存在服务器内存中（例如OpenStreeMap世界.xml)或者一次解析的消息太多，那么您应该选择SAX。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

在python中解析多个子元素

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >