我希望我的代码能够使用Python过滤掉没有种子的链接。

from bs4 import BeautifulSoup import urllib.request import re class AppURLopener(urllib.request.FancyURLopener): version = "Mozilla/5.0" url = input('What site you working on today, sir?\n-> ') opener = AppURLopener() html_page = opener.open(url) soup = BeautifulSoup(html_page, "lxml") pd = str(soup.findAll('td', attrs={'align':re.compile('right')})) for link in soup.findAll('a', attrs={'href': re.compile("^magnet")}): if not('0' is pd[18]): print (link.get('href'),'\n')

1条回答

网友

1楼 · 发布于 2024-05-19 21:37:57

此代码片段将从页面中提取所有磁铁链接，其中包含seeders！=0：

from bs4 import BeautifulSoup
import requests
from pprint import pprint

soup = BeautifulSoup(requests.get('https://pirateproxy.mx/browse/201/1/3').text, 'lxml')
tds = soup.select('#searchResult td.vertTh ~ td')
links = [name.select_one('a[href^=magnet]')['href'] for name, seeders, leechers in zip(tds[0::3], tds[1::3], tds[2::3]) if seeders.text.strip() != '0']

pprint(links, width=120)

印刷品：

['magnet:?xt=urn:btih:aa8a1f7847a49e640638c02ce851effff38d440f&dn=Affairs.of.State.2018.BRRip.x264.AC3-Manning&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969&tr=udp%3A%2F%2Fzer0day.ch%3A1337&tr=udp%3A%2F%2Fopen.demonii.com%3A1337&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Fexodus.desync.com%3A6969',
 'magnet:?xt=urn:btih:819cb9b477462cd61ab6653ebc4a6f4e790589c3&dn=Bad.Samaritan.2018.BRRip.x264.AC3-Manning&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969&tr=udp%3A%2F%2Fzer0day.ch%3A1337&tr=udp%3A%2F%2Fopen.demonii.com%3A1337&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Fexodus.desync.com%3A6969',
 'magnet:?xt=urn:btih:843d01992aa81d52be68190ee6a733ec9eee9b13&dn=The+Darkest+Minds+2018+HDCAM-1XBET&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969&tr=udp%3A%2F%2Fzer0day.ch%3A1337&tr=udp%3A%2F%2Fopen.demonii.com%3A1337&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Fexodus.desync.com%3A6969',
 'magnet:?xt=urn:btih:09a23daa69c42003d905ecf0a1cefdb0474e7d88&dn=Insidious+The+Last+Key+2018+BRRip+x264+AAC-SSN&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969&tr=udp%3A%2F%2Fzer0day.ch%3A1337&tr=udp%3A%2F%2Fopen.demonii.com%3A1337&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Fexodus.desync.com%3A6969',
 'magnet:?xt=urn:btih:98c42d5d620b4db834c5437a75f6da6f2d158207&dn=The+Darkest+Minds+2018+HDCAM-1XBET%5BTGx%5D&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969&tr=udp%3A%2F%2Fzer0day.ch%3A1337&tr=udp%3A%2F%2Fopen.demonii.com%3A1337&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Fexodus.desync.com%3A6969',
 'magnet:?xt=urn:btih:f30ebc409b215f2a5237433d7508c7ebfabb0e16&dn=Journeyman.2017.SWESUB.BRRiP.x264.mp4&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969&tr=udp%3A%2F%2Fzer0day.ch%3A1337&tr=udp%3A%2F%2Fopen.demonii.com%3A1337&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Fexodus.desync.com%3A6969',

...and so on.

编辑：

soup.select('#searchResult td.vertTh ~ td')将选择带有类vertTh的标记<td>的所有<td>同级，该类位于带有id=searchResult的标记内。每行有三个这样的兄弟姐妹。你知道吗

然后select_one('a[href^=magnet]')将选择href以magnet开头的所有链接。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章