如何仅获取BeautifulSoup4的部分输出并重新格式化

2024-04-25 03:53:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我一直在试着做一个Torrent磁铁刮板。我的代码可以工作到这样一种程度:它可以刮取我需要的信息(磁铁链接),但也可以刮取额外的信息,如HTML标记(如href/title=“…”等)。我希望代码只输出磁铁链接、种子大小和种子名称,如果输出在每个种子之间缩进也很好

代码如下:

from bs4 import BeautifulSoup
import requests

URL = 'https://eztv.io/'
response = requests.get(URL)
content = BeautifulSoup(response.content, "html.parser")
magnet = content.findAll('a', attrs={"class": "magnet"})
print(magnet)

这是代码输出的样子:

CAFFEiNE%5Beztv%5D.mkv%5Beztv%5D&amp;tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A80&amp;tr=udp%3A%2F%2Fglotorrents.pw%3A6969%2Fannounce&amp;tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&amp;tr=udp%3A%2F%2Fexodus.desync.com%3A6969" rel="nofollow" title="Marketplace S47E09 Locksmith Ripoffs Fake Listings Fake Reviews 720p WEB h264-CAFFEiNE [eztv] (577.03 MB) Magnet Link"></a>, <a class="magnet" href="magnet:?xt=urn:btih:4ceaabd8ea97eed474a645d9d9223fe51f354ac3&amp;dn=Marketplace.S47E06.Are.Noisy.Restaurants.Harmful.to.Your.Health.480p.x264-mSD%5Beztv%5D.mkv%5Beztv%5D&amp;tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A80&amp;tr=udp%3A%2F%2Fglotorrents.pw%3A6969%2Fannounce&amp;tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&amp;tr=udp%3A%2F%2Fexodus.desync.com%3A6969" rel="nofollow" title="Marketplace S47E06 Are Noisy Restaurants Harmful to Your Health 480p x264-mSD [eztv] (164.48 MB) Magnet Link"></a>, <a class="magnet" href="magnet:?xt=urn:btih:0ca16cfac499bea3fa6bfd564484e90c38501aad&amp;dn=Marketplace.S47E08.Blinded.by.Blue.Lights-Banned.from.Seniors.Homes.WEB.h264-CAFFEiNE%5Beztv%5D.mkv%5Beztv%5D&amp;tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A80&amp;tr=udp%3A%2F%2Fglotorrents.pw%3A6969%2Fannounce&amp;tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&amp;tr=udp%3A%2F%2Fexodus.desync.com%3A6969" rel="nofollow" title="Marketplace S47E08 Blinded by Blue Lights-Banned from Seniors Homes WEB h264-CAFFEiNE [eztv] (319.27 MB) Magnet Link"></a>, <a class="magnet" href="magnet:?xt=urn:btih:654c1dbb3be446ed3ebc1aff69d6ba1779577ce8&amp;dn=Marketplace.S47E08.Blinded.by.Blue.Lights-Banned.from.Seniors.Homes.720p.WEB.h264-CAFFEiNE%5Beztv%5D.mkv%5Beztv%5D&amp;tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A80&amp;tr=udp%3A%2F%2Fglotorrents.pw%3A6969%2Fannounce&amp;tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&amp;tr=udp%3A%2F%2Fexodus.desync.com%3A6969" rel="nofollow" title="Marketplace S47E08 Blinded 
by Blue Lights-Banned from Seniors Homes 720p WEB h264-CAFFEiNE [eztv] (576.63 MB) Magnet Link"></a>, <a class="magnet" href="magnet:?xt=urn:btih:b3c5a40fe8484ebfa0790bc97d5fda77c6ba0a13&amp;dn=Marketplace.S47E07.Food.Fact-Check.WEB.h264-CAFFEiNE%5Beztv%5D.mkv%5Beztv%5D&amp;tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A80&amp;tr=udp%3A%2F%2Fglotorrents.pw%3A6969%2Fannounce&amp;tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&amp;tr=udp%3A%2F%2Fexodus.desync.com%3A6969" rel="nofollow" title="Marketplace S47E07 Food Fact-Check WEB h264-CAFFEiNE [eztv] (319.01 MB) Magnet Link"></a>]

我用https://eztv.io/来刮去山洪


Tags: fromwebtitletrclassrelamphref
1条回答
网友
1楼 · 发布于 2024-04-25 03:53:11

在您的代码上尝试此方法,看看是否有效:

for m in magnet:
    print(m.attrs['href'].split('dn=')[0])

在我的例子中,我得到了输出:

magnet:?xt=urn:btih:95f3f6abf3a87a19d6fec25c3b78a8ccc3890ee4&
magnet:?xt=urn:btih:2379252ac7179fd49110091eeaa962237be38b16&
magnet:?xt=urn:btih:fd4db49ba5cc5700b533c3f06a926fde237e679f&
magnet:?xt=urn:btih:b2738b271d1e547651b7cf182ea541158c6601c7&

等等。注意:它们都以&结尾,而不是像你说的那样以&amp结尾;不知道为什么

相关问题 更多 >