我试图废除一个网站,其中有分页链接,所以我这样做了
import scrapy
class SummymartSpider(scrapy.Spider):
name = 'dummymart'
allowed_domains = ['www.dummrmart.com/product']
start_urls = ['https://www.dummymart.net/product/auto-parts--118?page%s'% page for page in range(1,20)]
成功了!!对于单个url,它可以工作,但当我尝试这样做时:
import scrapy
class DummymartSpider(scrapy.Spider):
name = 'dummymart'
allowed_domains = ['www.dummymart.com/product']
start_urls = ['https://www.dummymart.net/product/auto-parts--118?page%s',
'https://www.dummymart.net/product/accessories-tools--112?id=1316264860?page%s'% page for page in range(1,20)]
它不工作,我如何实现相同的逻辑,但多个网址?谢谢
一种方法是使用
scrapy.Spider
的start_requests()
方法,而不是使用start_urls
属性。You can see more here如果您想继续使用
start_urls
属性,可以尝试这样的方法(我还没有测试它):start_urls = ['https://www.dummymart.net/product/auto-parts 118?page%s' % page for page in range(1,20)] + ['https://www.dummymart.net/product/accessories-tools 112?id=1316264860?page%s'% page for page in range(1,20)]
还要注意,在
allowed_domains
属性中,您只需要指定域。See here。你知道吗相关问题 更多 >
编程相关推荐