我正在为一个id迭代地抓取两个页面。第一个刮刀适用于所有id,但第二个刮刀仅适用于一个id
class MySpider(scrapy.Spider):
name = "scraper"
allowed_domains = ["example.com"]
start_urls = ['http://example.com/viewData']
def parse(self, response):
ids = ['1', '2', '3']
for id in ids:
# The following method scraps for all id's
yield scrapy.Form.Request.from_response(response,
...
callback=self.parse1)
# The following method scrapes only for 1st id
yield Request(url="http://example.com/viewSomeOtherData",
callback=self.intermediateMethod)
def parse1(self, response):
# Data scraped here using selectors
def intermediateMethod(self, response):
yield scrapy.FormRequest.from_response(response,
...
callback=self.parse2)
def parse2(self, response):
# Some other data scraped here
我想为一个id去掉两个不同的页面
更改以下行:
收件人:
^{pr2}$为我工作。在
Scrapy有一个重复的URL过滤器,这可能是在过滤你的请求。按照Steve的建议,尝试在回调后添加dont_filter=True。在
相关问题 更多 >
编程相关推荐