如何在已通过scrapy下载的网页上使用selenium.PhantomJS()

2 投票

2 回答

841 浏览

提问于 2025-04-18 15:10

def parseList(self, response):
    dr=webdriver.PhantomJS()   
    dr.get(response.url)
    pageSource = dr.page_source
    print dr.page_source

这个网页已经被scrapy下载了（包含在response.body里面），而dr.get(response.url)会再次下载一次。

有没有办法让selenium直接使用response.body呢？

网页抓取 selenium scrapy phantomjs

2 个回答

无论这个参数是什么类型，最终存储的值都会是一个字符串（永远不会是unicode或None）。

我猜你在用Scrapy的时候也在用Python的Selenium。你可以用lxml或者其他库来解析那个response.body字符串。你说的“让Selenium使用response.body”具体是什么意思呢？

回答于 2025-04-18 由 Python大师

分享举报

那我们可以把从 response.body 得到的内容保存成一个HTML文件，然后再做一些其他的事情，比如：

url = "file:///your/path/to/downloaded/file.html"
dr.get(url)

回答于 2025-04-18 由 Python大师

分享举报