Python lxml规范化字典中所有子元素

2024-06-16 09:37:13 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试解析一个安全椭圆XML定义文件,以便自动执行测试。你知道吗

我试图实现的是,对于每个定义,转换字典中的测试标准和标准。你知道吗

标准XML结构如下所示:

 <criteria operator="AND">
        <criteria comment="Affected IOSXE configuration" operator="AND">
          <criterion comment="ASR 1000 series router" test_ref="oval:org.cisecurity:tst:5943" />
          <criteria comment="Affected IOSXE configuration" operator="OR">
            <criteria comment="Zone-based firewall configured" operator="AND">
              <criterion comment="Match TCP or UDP" test_ref="oval:org.cisecurity:tst:6071" />
              <criterion comment="ZBFW inspection enabled" test_ref="oval:org.cisecurity:tst:5850" />
            </criteria>
            <criteria comment="NAT and PPTP ALG are enabled" operator="AND">
              <criterion comment="NAT configured" test_ref="oval:org.cisecurity:tst:6020" />
              <criterion comment="NAT enabled" test_ref="oval:org.cisecurity:tst:6146" />
              <criterion comment="PPTP ALG disabled" negate="true" test_ref="oval:org.cisecurity:tst:5668" />
            </criteria>
            <criteria comment="NAT and TCP reassembly are enabled" operator="AND">
              <criterion comment="NAT configured" test_ref="oval:org.cisecurity:tst:6020" />
              <criterion comment="NAT enabled" test_ref="oval:org.cisecurity:tst:6146" />
              <criterion comment="Affected processor" test_ref="oval:org.cisecurity:tst:5622" />
            </criteria>
            <criterion comment="EoGRE is enabled" test_ref="oval:org.cisecurity:tst:6003" />
          </criteria>
        </criteria>
        <criterion comment="IOSXE version is affected" test_ref="oval:org.cisecurity:tst:6178" />
      </criteria>

我可以使用下面的代码检索和映射第一级标准:

# Add OVAL ID attrib in normalized Vulnerability dictionary
for idx, vuln in enumerate(vuln_list):
    vuln['oval_id'] = root.xpath("//ns:definition", namespaces=ns)[idx].attrib['id']

    criteria = root.xpath("//ns:definition[@id='" + vuln_list[idx]['oval_id'] + "']/ns:criteria/*", namespaces=ns)

    vuln['criteria'] = [crit.items() for crit in criteria]

这将用下面的结果填充我的字典,显然缺少嵌套的children元素:

{'cisco_adv_id': 'cisco-sa-20131030-asr1000',
  'cisco_adv_url': 'http://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20131030-asr1000',
  'criteria': [[('comment', 'Affected IOSXE configuration'),
                ('operator', 'AND')],
               [('comment', 'IOSXE version is affected'),
                ('test_ref', 'oval:org.cisecurity:tst:6178')]],
  'cve_id': 'CVE-2013-5547',
  'oval_id': 'oval:org.cisecurity:def:4321',
  'title': 'Cisco IOS XE Software Malformed EoGRE Packet Denial of Service '
           'Vulnerability'},

我可以使用getchildren()检查嵌套for循环并检查元素是否有子元素,但这听起来不像是最佳解决方案,因为每个定义都有一个或多个criteria/criteria元素。你知道吗

关于如何更有效地解析这个问题,有什么想法吗?你知道吗

提前谢谢。你知道吗


Tags: andorgtestrefidcommentenabledoperator
1条回答
网友
1楼 · 发布于 2024-06-16 09:37:13

如果使用递归,则相对容易。你知道吗

对于第一个示例,我尝试保持与您相同的组织:每个条件都是一个包含属性和子级的列表,但都存储为dict而不是tuple

def get_data(el):
    if el.tag =='criteria':
        data = {'criteria': [el.attrib]}
        for desc in el.iterchildren():
            data['criteria'].append(get_data(desc))
        return data
    else:
        return {'criterion': el.attrib}

问题是返回的数据不容易使用:每个条件最多可以包含三个dict(属性、条件或条件),您必须进行一些测试才能知道哪个是哪个。在第二个示例中,您预先知道列表包含什么:如果键是criteria,您就知道您将拥有一个criteria dict列表。你知道吗

def get_data(el):
    if el.tag =='criteria':
        data = {}
        data.update(el.attrib)
        for desc in el.iterchildren():
            key = desc.tag
            if not key in data:
                data[key] = []
            data[key].append(get_data(desc))
        return data
    else:
        return el.attrib

相关问题 更多 >