制作皮屑。请求决策学？

class SomeSpider(scrapy.Spider): def parse(self, response): # get all ads(25) from ads list for ad in adList(): add_url = findAddUrl() yield scrapy.Request(add_url, callback=self.parseAd) # go to next page if some_condition_OK: next_page_url = findNextpageUrl() yield scrapy.Request(next_page_url) else: print 'Stoped at.' def parseAd(self, response): field_1 = get_field_1() field_n = get_field_n() # save field_1 to field_n to sqlite DB

2条回答

网友

1楼 · 编辑于 2024-06-08 00:04:21

使Scrapy具有确定性的唯一方法是在同一时间只生成一个请求，而将其余请求保留在列表或队列中：

class SomeSpider(scrapy.Spider):

    pending_request = []

    def parse(self, response):

        # get all ads(25) from ads list
        for ad in adList():
            add_url = findAddUrl()
            self.pending_request.append(
                scrapy.Request(add_url, callback=self.parseAd))

        # go to next page
        if some_condition_OK:
             next_page_url = findNextpageUrl()
             self.pending_request.append(scrapy.Request(next_page_url))
        else:
            print 'Stoped at.'

        if self.pending_request:
            yield self.pending_request.pop(0)

    def parseAd(self, response):
        field_1 = get_field_1()
        field_n = get_field_n()

        if self.pending_request:
            yield self.pending_request.pop(0)

网友

2楼 · 编辑于 2024-06-08 00:04:21

添加以下设置：

DOWNLOAD_DELAY

Default: 0

下载延迟=0.25#250毫秒延迟

但是scrapy还有一个功能，可以自动设置下载延迟，称为AutoThrottle。它会根据Scrapy服务器和正在爬网的网站的负载自动设置延迟。这比设置任意延迟效果更好。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章