scrapy上的空列表响应提取

1条回答

网友

1楼 · 发布于 2024-05-23 22:34:52

首先，当我查阅页面的源代码时，您似乎有兴趣删除标题标签<h1>中的标题Iced Teas。我说得对吗？你知道吗

其次，我尝试了ScrapyShell会话来理解这个问题。它似乎是用户代理请求头的设置。请看下面的代码会话：

未设置用户代理

scrapy shell https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas
In [1]: response.css('.tileList-title').extract()                               
Out[1]: []
view(response) #open the given response in your local web browser, for inspection.

设置了用户代理

scrapy shell https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas -s USER_AGENT='Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'

In [1]: response.css('.tileList-title').extract()                               
Out[1]: ['<h1 class="tileList-title" ng-if="$ctrl.listTitle" tabindex="-1">Iced Teas</h1>']
#now as you can see it does not return an empty list.
view(response)

因此，为了改进您未来的实践，您可以在scrapyshell会话中使用-s KEYWORDSETTING=value。这里的settings key words表示“刮痧”。并使用view(response)检查请求是否返回预期的内容，即使它发送了一个200。根据我的经验，使用view(response)您可以看到，当您在ScrapyShell中使用它时，内容页（有时甚至源代码）与在普通浏览器中使用它时有点不同。所以这是一个很好的实践来检查这个快捷方式。这里的shorcuts表示“刮痧”。它们也会在每次ScrapyShell会议上被提及。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章

scrapy上的空列表响应提取

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >