正则表达式是必需的，或者可以优化输出

2024-06-16 12:11:08 发布

男 | 程序猿一只，喜欢编程写python代码。

如果我使用以下功能，我可以从网站上获取所需的文本和链接：

def get_url_text(url):
    source = requests.get(url)
    plain_text = source.text
    soup = BeautifulSoup(plain_text)
    for item_name in soup.findAll('li', {'class': 'ptb2'}):
        print(item_name.string)
        print (item_name.a)

get_url_text('https://www.residentadvisor.net/podcast.aspx')

退货：

RA.532 Marquis Hawkes
<a href="/podcast-episode.aspx?id=532"><h1>RA.532 Marquis Hawkes</h1></a>
RA.531 Evan Baggs
<a href="/podcast-episode.aspx?id=531"><h1>RA.531 Evan Baggs</h1></a>
RA.530 MCDE vs Jeremy Underground

如果我只想要href链接，而不是它周围的标签等，我需要使用regex还是BeautifulSoup中有其他方法？你知道吗

期望输出为：

RA.532 Marquis Hawkes
https://www.residentadvisor.net/podcast-episode.aspx?id=532

对于每个相似的元素。你知道吗

Tags： text name id url source get 链接 item

1条回答

网友

1楼 · 发布于 2024-06-16 12:11:08

您可以使用print(item_name.a['href'])并（如果需要）在前缀https://www.residentadvisor.net前面加上前缀（因为网页中的链接是以没有显式scheme和netloc部分的形式使用的-例如，/podcast-episode.aspx?id=528）

正则表达式是必需的，或者可以优化输出

相关问题更多 >

编程相关推荐

热门问题

热门文章

正则表达式是必需的，或者可以优化输出

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >