Python:使用lxml + objectify + findall或fromstring获取特定节点值和属性

1 投票
1 回答
1810 浏览
提问于 2025-04-18 11:43

我从NVD网站上提取并剪切了一部分XML源代码,下面是这段代码:

<?xml version='1.0' encoding='UTF-8'?>
<nvd xmlns="http://nvd.nist.gov/feeds/cve/1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://nvd.nist.gov/feeds/cve/1.2 http://nvd.nist.gov/schema/nvdcve.xsd" pub_date="2014-07-01" nvd_xml_version="1.2">
   <entry CVSS_base_score="6.4" CVSS_exploit_subscore="10.0" CVSS_impact_subscore="4.9" CVSS_score="6.4" CVSS_vector="(AV:N/AC:L/Au:N/C:P/I:P/A:N)" CVSS_version="2.0" modified="2014-06-30" name="CVE-2011-1381" published="2014-06-27" seq="2011-1381" severity="Medium" type="CVE">
      <desc>
        <descript source="cve">Unspecified vulnerability in IBM OpenPages GRC Platform 6.1.0.1 before IF4 allows remote attackers to bypass intended access restrictions via unknown vectors.</descript>
      </desc>
   </entry>
   <entry CVSS_base_score="3.5" CVSS_exploit_subscore="6.8" CVSS_impact_subscore="2.9" CVSS_score="3.5" CVSS_vector="(AV:N/AC:M/Au:S/C:P/I:N/A:N)" CVSS_version="2.0" modified="2014-06-30" name="CVE-2014-4669" published="2014-06-28" seq="2014-4669" severity="Low" type="CVE">
      <desc>
        <descript source="cve">HP Enterprise Maps 1.00 allows remote authenticated users to read arbitrary files via a WSDL document containing an XML external entity declaration in conjunction with an entity reference within a GetQuote operation, related to an XML External Entity (XXE) issue.</descript>
      </desc>
   </entry>
</nvd>

正如这个问题的标题所提到的,以及上面的代码片段,我只是想获取'descript'节点的值和属性。我尝试使用findall方法,但它返回的是一个空列表:

root = etree.fromstring(open("c:/temp/CVE/sample.xml").read()).getroottree().getroot()
root.findall('entry')

这返回了:

[]

当我打印根节点的标签时,返回的是:

'{http://nvd.nist.gov/feeds/cve/1.2}nvd'

我还尝试打印直接父节点及其子节点的标签:

for e in root.iterchildren():
print "Immediate parent : %s" % e.tag
children = e.getchildren()
for c in children : print "\t\tchildren : %s" % c.tag

这是它返回的结果:

Immediate parent : {http://nvd.nist.gov/feeds/cve/1.2}entry
    children : {http://nvd.nist.gov/feeds/cve/1.2}desc
Immediate parent : {http://nvd.nist.gov/feeds/cve/1.2}entry
    children : {http://nvd.nist.gov/feeds/cve/1.2}desc

再次强调,我想要的只是获取'descript'节点的属性和值。任何建议都非常感谢。提前谢谢你们!

1 个回答

2

你需要在xpath表达式中添加命名空间前缀:

tree = etree.fromstring(open("c:/temp/CVE/sample.xml").read()).getroottree().getroot()
for descript in tree.xpath('//ns:entry/ns:desc/ns:descript', namespaces={'ns': 'http://nvd.nist.gov/feeds/cve/1.2'}):
    print descript.text
    print descript.attrib.get('source')

输出结果是:

Unspecified vulnerability in IBM OpenPages GRC Platform 6.1.0.1 before IF4 allows remote attackers to bypass intended access restrictions via unknown vectors.
cve
HP Enterprise Maps 1.00 allows remote authenticated users to read arbitrary files via a WSDL document containing an XML external entity declaration in conjunction with an entity reference within a GetQuote operation, related to an XML External Entity (XXE) issue.
cve

另外,看看这个相关的讨论:

撰写回答