下面是我的代码。成功登录后,我无法刮取IMDB。问题是after\ u login验证表单请求是否有效,但当我通过发出新请求打印登录后页面的内容时,它会显示主IMDB页面,而不是用户登录的主IMDB页面。你知道吗
"""
Attributes:
name (str): essential attribute which specifies the name of the spider
start_urls (list): the urls that are to be scraped
"""
name = 'IMDB_spider'
start_urls = ['https://www.imdb.com/ap/signin?clientContext=131-8656718-8097200&openid.pape.max_auth_age=0&openid.'
'return_to=https%3A%2F%2Fwww.imdb.com%2Fap-signin-handler&openid.identity=http%3A%2F%2Fspecs.openid.ne'
't%2Fauth%2F2.0%2Fidentifier_select&openid.assoc_handle=imdb_us&openid.mode=checkid_setup&siteState=ey'
'JvcGVuaWQuYXNzb2NfaGFuZGxlIjoiaW1kYl91cyIsInJlZGlyZWN0VG8iOiJodHRwczovL3d3dy5pbWRiLmNvbS8_cmVmXz1sb2d'
'pbiJ9&openid.claimed_id=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.ns=http'
'%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0&&tag=imdbtag_reg-20']
def parse(self, response):
"""
Scrapy's default method that handles all the downloaded response for
each request made.
Arguments:
response (text): contains all data of the page and other helpful
methods as well
"""
return scrapy.FormRequest.from_response(
response,
formdata={'username': '*******', 'password': '****'},
callback=self.after_login
)
def after_login(self, response):
"""
Default callback method that is called to authenticate when logging in
to website.
Arguments:
response (text): contains all data of the page and other helpful
methods as well
"""
if "There was a problem." in response.body:
print('Login Failed')
return
print('Login Success')
return scrapy.Request(url="http://www.imdb.com",
callback=self.parse_imdb_page)
def parse_imdb_page(self, response):
print response.body
请帮忙
目前没有回答
相关问题 更多 >
编程相关推荐