使用BeautifulSoup在Python中提取链接标签之间的文本

1 投票

3 回答

6268 浏览

数据工程师

提问于 2025-04-16 19:01

我有一段这样的HTML代码：

<h2 class="title"><a href="http://www.gurletins.com">我的主页</a></h2>

<h2 class="title"><a href="http://www.gurletins.com/sections">章节</a></h2>

我需要提取标签之间的文本（链接描述）。我想把这些文本存储在一个数组里，像这样：

a[0] = "我的主页"

a[1] = "章节"

我想用Python和BeautifulSoup来实现这个。

请帮帮我，谢谢！

3 个回答

下面的代码会从标签中提取文本（链接描述），并把这些文本存储到一个数组里。

>>> from bs4 import BeautifulSoup
>>> data = """<h2 class="title"><a href="http://www.gurletins.com">My 
HomePage</a></h2>
...
... <h2 class="title"><a href="http://www.gurletins.com/sections">Sections</a>
</h2>"""
>>> soup = BeautifulSoup(data, "html.parser")
>>> reqTxt = soup.find_all("h2", {"class":"title"})
>>> a = []
>>> for i in reqTxt:
...     a.append(i.get_text())
...
>>> a
['My HomePage', 'Sections']
>>> a[0]
'My HomePage'
>>> a[1]
'Sections'

回答于 2025-04-16 由 Python大师

分享举报

这段代码的意思是：从一个叫做`soup`的对象中，找到所有的链接（也就是标签），然后对每一个链接，提取出它里面的文字内容。最后，把所有提取到的文字放在一个列表里，并打印出来。

回答于 2025-04-16 由 Python大师

分享举报

你可以这样做：

import BeautifulSoup

html = """
<html><head></head>
<body>
<h2 class='title'><a href='http://www.gurletins.com'>My HomePage</a></h2>
<h2 class='title'><a href='http://www.gurletins.com/sections'>Sections</a></h2>
</body>
</html>
"""

soup = BeautifulSoup.BeautifulSoup(html)

print [elm.a.text for elm in soup.findAll('h2', {'class': 'title'})]
# Output: [u'My HomePage', u'Sections']

回答于 2025-04-16 由 Python大师

分享举报

使用BeautifulSoup在Python中提取链接标签之间的文本

3 个回答

撰写回答