不能用beautifulsoup在python中删除链接

2024-06-16 11:04:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我想把标签和^{cl1}的网页上的所有链接都删除$

一个HTML示例:-

<a class="author track" href="/nileshkikuuchise" data-gaq="author" data-dmc="entry-artist">
                                                                        <img class="avatar" src="https://ctl.s6img.com/cdn/s6-original-art-uploads/society6/uploads/u/nileshkikuuchise/avatar_asset/5323d6c4d92143e8b37f0fa644d7044f_p3.jpg" width="20" height="20" data-dmc="entry-photo">
                                                                    Nileshkikuuchise                                </a>

我的code:- 你知道吗

discover_page = BeautifulSoup(r.text, 'html.parser')
finding_accounts = discover_page.find_all("a", "[class~=author track]")
print(finding_accounts)

输出为无

如何将href值添加到列表中?。我可以稍后再做for循环,但是需要先把基本的东西弄好


Tags: datapagetrack标签classauthorhrefdiscover
1条回答
网友
1楼 · 发布于 2024-06-16 11:04:23

您似乎混合了selectfind_all所期望的样式

这两种方法对我很有效:

>>> r = '''
<a class="author track" href="/nileshkikuuchise" data-gaq="author" data-dmc="entry-artist">
                                                                        <img class="avatar" src="https://ctl.s6img.com/cdn/s6-original-art-uploads/society6/uploads/u/nileshkikuuchise/avatar_asset/5323d6c4d92143e8b37f0fa644d7044f_p3.jpg" width="20" height="20" data-dmc="entry-photo">
                                                                    Nileshkikuuchise                                </a>
'''

>>> discover_page = BeautifulSoup(r, 'html.parser')
>>> discover_page.find_all("a", class_="author track")
[<a class="author track" data-dmc="entry-artist" data-gaq="author" href="/nileshkikuuchise">
 <img class="avatar" data-dmc="entry-photo" height="20" src="https://ctl.s6img.com/cdn/s6-original-art-uploads/society6/uploads/u/nileshkikuuchise/avatar_asset/5323d6c4d92143e8b37f0fa644d7044f_p3.jpg" width="20"/>
                                                                     Nileshkikuuchise                                </a>]
>>> discover_page.select('a[class="author track"]')
[<a class="author track" data-dmc="entry-artist" data-gaq="author" href="/nileshkikuuchise">
 <img class="avatar" data-dmc="entry-photo" height="20" src="https://ctl.s6img.com/cdn/s6-original-art-uploads/society6/uploads/u/nileshkikuuchise/avatar_asset/5323d6c4d92143e8b37f0fa644d7044f_p3.jpg" width="20"/>
                                                                     Nileshkikuuchise                                </a>]

相关问题 更多 >