从Python中的XML文件中获取值

2024-03-29 14:53:26 发布

您现在位置:Python中文网/ 问答频道 /正文

让我们考虑一个示例XML文件:

<?xml version="1.0" encoding="ISO-8859-1"?>
<feats>
  <feat>
    <name>Blindsight, 5-Ft. Radius</name>
    <type>General</type>
    <multiple>No</multiple>
    <stack>No</stack>
    <prerequisite>Base attack bonus +4, Blind-Fight, Wisdom 19.</prerequisite>
    <benefit><div topic="Benefit" level="8"><p><b>Benefit:</b> Using senses such as acute hearing and sensitivity to vibrations, you detect the location of opponents who are no more than 5 feet away from you. <i>Invisibility</i> and <i>darkness</i> are irrelevant, though it you discern incorporeal beings.</p><p/>
</div>
</benefit>
    <full_text>
      <div topic="Blindsight, 5-Ft. Radius" level="3">Lorem ipsum
</div>
</div>
    </full_text>
    <reference>SRD 3.5 DivineAbilitiesandFeats</reference>
  </feat>
</feats>

我想从<benefit>标记中获取一个文本作为字符串,但是没有<div>标记(<p>和{}不应删除)。因此,在这种情况下,结果将是:

^{pr2}$

我设法得到了整个<div>元素,但是当我使用.text属性从中获取字符串时,它给出了mo None。在

tree = ET.parse(filename)
root = tree.getroot()
data={}
for item in root.findall('feat'):
    data["benefit"]=""

    element = item.find('benefit').find("div")
    print element.text

有一个简单的方法来得到这篇文章还是我必须为它编写特殊的函数?在


Tags: notextnamedivyoustacktypemultiple
2条回答

使用lxml可以首先找到<b>元素,得到tail并将其与以下同级元素组合以生成所需的结果,例如:

from lxml import etree as ET
raw = '''your XML string here'''

root = ET.fromstring(raw)
b = root.xpath("//benefit/div/p/b")[0]
result = b.tail + ''.join(ET.tostring(node) for node in b.xpath("following-sibling::*"))
print result

输出:

^{pr2}$

或者,如果您想简单地获取<p>的全部内容,包括其中的标记,那么您可以执行this way(这个方法可以使用lxml或{}:

p = root.find(".//benefit/div/p")
result = p.text + ''.join(ET.tostring(node) for node in p)

输出:

<b>Benefit:</b> Using senses such as acute hearing and sensitivity to vibrations, you detect the location of opponents who are no more than 5 feet away from you. <i>Invisibility</i> and <i>darkness</i> are irrelevant, though it you discern incorporeal beings.

不过,我同意马特关于美妆的看法

我在你的代码片段中添加了一些正则表达式,结果很好

import xml.etree.ElementTree as ET
import re
tree = ET.parse('data.xml')
root = tree.getroot()
data = {};
result = [];
for item in root.iter('benefit'):

    cleaned = re.sub(r'<[^>]*>', '', ET.tostring(item, encoding="utf-8"));
    result.append(cleaned)

print result;
//result ['Benefit: Using senses such as acute hearing and sensitivity to vibrations, you detect the location of opponents who are no more than 5 feet away from you. Invisibility and darkness are irrelevant, though it you discern incorporeal beings.\n\n\n    ']

相关问题 更多 >