我已经在stackoverflow上搜索了几个小时,但仍然没有找到适合我当前所做工作的答案。我想使用Selenium跳过一个初始页面点击,然后将cookies传输到Scrapy,然后对数据库进行爬网。到目前为止,我一直被重定向到初始登录页面。在
我的基础是抓起饼干,把它们放在这个答案的请求中 scrapy authentication login with cookies
class HooversTest(scrapy.Spider):
global starturls
name = "hooversTest"
allowed_domains = ["http://subscriber.hoovers.com"]
login_page = ["http://subscriber.hoovers.com/H/home/index.html"]
start_urls = ["http://subscriber.hoovers.com/H/company360/overview.html?companyId=99566395",
"http://subscriber.hoovers.com/H/company360/overview.html?companyId=10723000000000"]
def login(self, response):
return Request(url=self.login_page,
cookies=self.get_cookies(), callback=self.after_login)
def get_cookies(self):
self.driver = webdriver.Firefox()
self.driver.get("http://www.mergentonline.com/Hoovers/continue.php?status=sucess")
elem = self.driver.find_element_by_name("Continue")
elem.click()
time.sleep(15)
cookies = self.driver.get_cookies()
#reduce(lambda r, d: r.update(d) or r, cookies, {})
self.driver.close()
return cookies
def parse(self, response):
return Request(url="http://subscriber.hoovers.com/H/company360/overview.html?companyId=99566395",
cookies=self.get_cookies(), callback=self.after_login)
def after_login(self, response):
hxs = HtmlXPathSelector(response)
print hxs.select('//title').extract()
目前没有回答
相关问题 更多 >
编程相关推荐