无法在BeautifulSoup中获取正确链接

3 投票

2 回答

754 浏览

提问于 2025-04-16 01:43

我正在尝试解析一些HTML内容，想要提取出符合特定模式的链接。我使用了find方法和正则表达式，但总是找不到正确的链接。以下是我的代码片段。有人能告诉我我哪里出错了吗？

from BeautifulSoup import BeautifulSoup
import re

html = """
<div class="entry">
    <a target="_blank" href="http://www.rottentomatoes.com/m/diary_of_a_wimpy_kid/">RT</a>
    <a target="_blank" href="http://www.imdb.com/video/imdb/vi2496267289/">Trailer</a> &ndash; 
    <a target="_blank" href="http://www.imdb.com/title/tt1196141/">IMDB</a> &ndash; 
</div>
"""

soup = BeautifulSoup(html)
print soup.find('a', href = re.compile(r".*title/tt.*"))['href']

我应该得到第二个链接，但BS总是返回第一个链接。第一个链接的href根本不符合我的正则表达式，那它为什么还会返回呢？

谢谢。

2 个回答

我不能直接回答你的问题，不过你最开始发的代码里有个导入的错误。把

import BeautifulSoup

改成

from BeautifulSoup import BeautifulSoup

然后，你的输出（使用beautifulsoup版本3.1.0.1）将会是：

http://www.imdb.com/title/tt1196141/

回答于 2025-04-16 由 Python大师

分享举报

find 这个方法只会找到第一个 <a> 标签。如果你想找到所有的 <a> 标签，可以使用 findAll 方法。

回答于 2025-04-16 由 Python大师

分享举报

无法在BeautifulSoup中获取正确链接

2 个回答

撰写回答