擅长:python、mysql、java
<p>一个解决方案是,如果您不需要可视化页面,但是访问“<a href="http://loginrequired.com" rel="nofollow noreferrer">http://loginrequired.com</a>”的源代码将是使用Selenium和Scrapy。</p>
<p>基本上,你告诉Scrapy中间件停止重定向,当蜘蛛访问页面时,重定向就是处理重定向(302)。</p>
<p>在seting.py中必须设置</p>
<pre><code>"REDIRECT_ENABLED=False"
</code></pre>
<p>蜘蛛代码是:</p>
<pre><code>class LoginSpider(CrawlSpider):
name = "login"
allowed_domains = ['loginrequired.com']
start_urls = ['http://loginrequired.com']
handle_httpstatus_list = [302]
def __init__(self):
self.driver = webdriver.Firefox()
def parse(self, response):
if response.status in self.handle_httpstatus_list:
return Request(url="http://loginrequired.com", callback=self.after_302)
def after_302(self, response):
print response.url
# Your code to analysis the page by here
</code></pre>
<p>来自<a href="https://stackoverflow.com/questions/22795416/how-to-handle-302-redirect-in-scrapy">how to handle 302 redirect in scrapy</a>的想法</p>