我正在尝试从https://lajumate.ro/ajax/phone-number(example product page)获取电话号码
页面需要一个带有某些cookies和数据的POST机制。CURL请求的示例如下所示:
curl 'https://lajumate.ro/ajax/phone-number' -H 'Cookie: XSRF-TOKEN=eyJpdiI6IkVqUEwrZjU2UDJOaFB3NDl6b0xhTFE9PSIsInZhbHVlIjoiemJnTCt3S0UxTjUwVjVTQk0xUzlRSVpNZGVPM0dIVHBcL1JlYTVabmcxelFPNG5ZZ1d4NGYxbmpnRTAxeVRSaWRcLzZTUVhRVzlNcmtyOHJvcWFOdlE3UT09IiwibWFjIjoiYmUzOGNkNDlkMjMyNzY3YTQxNzE0ZWEwNmJhMDExZWUzODdmZmU5MmZmMTEwODk1ZTE3ZjYxNTkxZjYyNzFkOCJ9; ljs= eyJpdiI6ImdYR28xcnZvSXFiNHpSekVyeHJOQVE9PSIsInZhbHVlIjoiSnJtTlBRMmRJY1ZqNUtxWXdPREdlYnptc3pKWGRmZ1ppdjdCc0lcL040NzlDbytTcWNZb1Bwa0kyejlKM3NmNGZ0dDMwcFNhaXZ6WHlWSExFaHlNYnFnPT0iLCJtYWMiOiI4ZWY2MzRiNTY5Mjc3M2FmYjllNDJiODEyYWRmNzUxNjViYWM0OTIyZjQ3OTRjODhiMjM3N2NlNTJjYWJiNTRiIn0;' --data '_token=lT8dwMv5vqGrnh0drb6pW7sreYjguJn5qaCXZIck&ad_id=3834372' --compressed
这是有效的(注意cookies和令牌将过期)。所以我创建了一个蜘蛛来重新创建这个请求。代码如下:
req = FormRequest(
'https://lajumate.ro/ajax/phone-number',
callback=self.parse_phone,
formdata={'ad_id':re.sub(r".+?(\d+)\.html",r"\1",response.url),'_token':response.xpath('//input[@name="_token"]/@value').extract()[0]},
headers={'Cookie':'ljs='+ljs+';XSRF-TOKEN='+XSRF},
dont_filter=True
)
ljs和XSRF是从响应cookie中获取的。你知道吗
我使用两个调试记录器来检查请求:
self.logger.debug('Request headers: %s', dict(req.headers))
self.logger.debug('Request body: %s', req.body)
导致:
2017-01-04 11:44:41 [lajumate-sellers] DEBUG: Request headers: {'Cookie': ['ljs=eyJpdiI6IlBTU05tWlV0NW1DZGJaZk5nemEzTUE9PSIsInZhbHVlIjoiY3JDNFR2clpkMGVaNHVqODZFT2NvTmFRb1BKRmZCS0pCRndwd0xNNXVzV2M1WUNCUm5MWXFnbEU5RGZkQnVRNHFNMFp5S0E4TllkZXVtNk5cL3JSU1FBPT0iLCJtYWMiOiJhOWRhYmJmODg1NzcwOTRhNzQ5ZTlhNDg4OTEzZWNiNDc5NDhlNzZmMmQ3MDliYjM0ODlkZDAwOTYzN2NkNTkzIn0%3D;XSRF-TOKEN=eyJpdiI6ImNHNzZhbVViNWxTUm16bmg5amF0SFE9PSIsInZhbHVlIjoiWGRPMWFjVFBPTFNYWkxrNjI2THJIYU1KeStLcTg4Z3FFRkFqOWJjMDdHNUJKNXFuY2pKVXkxTVpuT1ExNXpSZWZHM1FPMzRjSTY0R3lSVndJME1GMFE9PSIsIm1hYyI6ImQwYThlMGQzYzA3NjA3YmE2ZTAwYjA0NjRiNzRjNTY4NGVlNjEwZjUxMzFiMWE0OGI3Nzk5YWVlNmVkODllNGEifQ%3D%3D'], 'Content-Type': ['application/x-www-form-urlencoded']}
2017-01-04 11:44:41 [lajumate-sellers] DEBUG: Request body: _token=VtCrPpqMwpcO1FRCZ12pnYmXj7Bv14B8o4aRcZyA&ad_id=3576651
这一切看起来都是应该的。但是当spider尝试加载页面时,它会用302状态码重定向请求。你知道吗
但是,在复制时,请将调试数据粘贴到curl命令或投吧,投吧我能得到数据。你知道吗
如何解决这个问题有什么建议吗?你知道吗
目前没有回答
相关问题 更多 >
编程相关推荐