关于正则表达式和XML

<text> The 40-Year-Old Virgin is a 2005 American buddy comedy film about a middle-aged man's journey to finally have sex. <h1>The plot</h1> Andy Stitzer (Steve Carell) is the eponymous 40-year-old virgin. <h1>Cast</h1> <h1>Soundtrack</h1> <h1>External Links</h1> </text>

3条回答

网友

1楼 · 编辑于 2024-05-15 15:36:32

使用XML解析器解析XML。使用lxml：

import lxml.etree as ET

content='''\
<text>
    The 40-Year-Old Virgin is a 2005 American buddy comedy
    film about a middle-aged man's journey to finally have sex.

    <h1>The plot</h1>
    Andy Stitzer (Steve Carell) is the eponymous 40-year-old virgin.
    <h1>Cast</h1>

    <h1>Soundtrack</h1>

    <h1>External Links</h1>
</text>
'''

text=ET.fromstring(content)
print(text.text)

收益率

^{pr2}$

网友

2楼 · 编辑于 2024-05-15 15:36:32

下面是如何使用^{}来完成此操作：

In [18]: import xml.etree.ElementTree as et

In [19]: t = et.parse('f.xml')

In [20]: print t.getroot().text.strip()
The 40-Year-Old Virgin is a 2005 American buddy comedy
    film about a middle-aged man's journey to finally have sex.

网友

3楼 · 编辑于 2024-05-15 15:36:32

不要使用正则表达式来解析XML/HTML。在python中使用适当的解析器，如beauthoulsoup或lxml。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

关于正则表达式和XML

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >