无法在beauthulsoup中获取正确的链接

2024-04-25 20:59:25 发布

男 | 程序猿一只，喜欢编程写python代码。

我试图解析一点HTML，我想提取与特定模式匹配的链接。我使用的是带有正则表达式的find方法，但它没有给我正确的链接。这是我的片段。有人能告诉我我做错了什么吗？在

from BeautifulSoup import BeautifulSoup
import re

html = """
<div class="entry">
    <a target="_blank" href="http://www.rottentomatoes.com/m/diary_of_a_wimpy_kid/">RT</a>
    <a target="_blank" href="http://www.imdb.com/video/imdb/vi2496267289/">Trailer</a> &ndash; 
    <a target="_blank" href="http://www.imdb.com/title/tt1196141/">IMDB</a> &ndash; 
</div>
"""

soup = BeautifulSoup(html)
print soup.find('a', href = re.compile(r".*title/tt.*"))['href']

我应该得到第二个链接，但是BS总是返回第一个链接。第一个链接的href甚至与我的regex不匹配，所以它为什么要返回它？在

谢谢。在

Tags： import div re com http target 链接 html

2条回答

网友

1楼 · 编辑于 2024-04-25 20:59:25

无法回答您的问题，但无论如何，您（最初）发布的代码存在导入错误。改变

import BeautifulSoup

到

^{pr2}$

然后，您的输出（使用beautifulsoup版本3.1.0.1）将是：

http://www.imdb.com/title/tt1196141/

网友

2楼 · 编辑于 2024-04-25 20:59:25

find只返回第一个<a>标记。你想要^{}。在

无法在beauthulsoup中获取正确的链接

相关问题更多 >

编程相关推荐

热门问题

热门文章

无法在beauthulsoup中获取正确的链接

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >