关于正则表达式和XML

2024-05-15 15:36:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我有XML格式的数据。示例如下所示。我想从<text> tag提取数据。 这是我的XML数据。在

    <text>
    The 40-Year-Old Virgin is a 2005 American buddy comedy
    film about a middle-aged man's journey to finally have sex.

    <h1>The plot</h1>
    Andy Stitzer (Steve Carell) is the eponymous 40-year-old virgin.
    <h1>Cast</h1>

    <h1>Soundtrack</h1>

    <h1>External Links</h1>
</text>

我只需要The 40-Year-Old Virgin is a 2005 American buddy comedy film about a middle-aged man's journey to finally have sex.有可能吗?谢谢


Tags: the数据textmiddleisxmlh1year
3条回答

使用XML解析器解析XML。使用lxml

import lxml.etree as ET

content='''\
<text>
    The 40-Year-Old Virgin is a 2005 American buddy comedy
    film about a middle-aged man's journey to finally have sex.

    <h1>The plot</h1>
    Andy Stitzer (Steve Carell) is the eponymous 40-year-old virgin.
    <h1>Cast</h1>

    <h1>Soundtrack</h1>

    <h1>External Links</h1>
</text>
'''

text=ET.fromstring(content)
print(text.text)

收益率

^{pr2}$

下面是如何使用^{}来完成此操作:

In [18]: import xml.etree.ElementTree as et

In [19]: t = et.parse('f.xml')

In [20]: print t.getroot().text.strip()
The 40-Year-Old Virgin is a 2005 American buddy comedy
    film about a middle-aged man's journey to finally have sex.

不要使用正则表达式来解析XML/HTML。在python中使用适当的解析器,如beauthoulsoup或lxml。在

相关问题 更多 >