Python拆分HTML

2024-04-19 22:10:45 发布

您现在位置:Python中文网/ 问答频道 /正文

因此,我有一个HTML标记,我想访问一个带有特定id的标记中带有特定类的标记。例如:

<tr id="one">
    <span class="x">X</span>
    .
    .
    .
    .
</tr>

如何在id为“one”的标记中获取类为“x”的标记的内容?你知道吗


Tags: 标记id内容htmlonetrclassspan
2条回答

我不习惯与lxml.xpath一起工作,所以我总是倾向于使用BeautifulSoup。以下是BeautifulSoup的解决方案:

>>> HTML = """<tr id="one">
...     <span class="x">X</span>
...     <span class="ax">X</span>
...     <span class="xa">X</span>
...     </tr>"""
>>>
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup(HTML)
>>> tr = soup.find('tr', {'id':'one'})
>>> span = tr.find('span', {'class':'x'})
>>> span
<span class="x">X</span>
>>> span.text
u'X'

You need something called "xpath".

from lxml import html
tree = html.fromstring(my_string)
x = tree.xpath('//*[@id="one"]/span[@class="x"]/text()')
print x[0] # X

相关问题 更多 >