具有动态路由的垃圾爬网站点

2024-04-27 00:03:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我怎样才能刮出所有工具从网站与动态路由

http://growthtools.io/social-media-automation-tools

当我试图

scrapy shell 'http://growthtools.io/social-media-automation-tools' 

我收到了以下结果

^{pr2}$

enter image description here

并且response对象不包含tools元素。在

In [3]: In [2]: response.css('.toolsList')
Out[3]: []
In [5]: 'toolsList' in response.body
Out[5]: False

谁能描述我如何解析http://growthtools.io/social-media-automation-tools以及为什么{}对象没有包含所有页面内容?在


Tags: 工具对象iniohttp网站responsesocial
1条回答
网友
1楼 · 发布于 2024-04-27 00:03:23

页面加载涉及由浏览器执行的JavaScript,Scrapy不是。不过,您可以使用^{}来解决它,它提供了一个中间件用于您的Scrapy项目。中间件使用^{} JS rendering service,您可以通过docker运行它。在

至于在废壳中测试它,可以遵循this example to run it from the shell。在

为我工作:

$ scrapy shell 'http://localhost:8050/render.html?url=http://growthtools.io/social-media-automation-tools' 
In [1]: response.css('.toolsList')
Out[1]: 
[<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>,
 <Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>,
 <Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>,
 <Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>,
 <Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>,
 <Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>,
 <Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>,
 <Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>,
 <Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>]

相关问题 更多 >