在使用Scrapy登录到个人配置文件后无法刮取IMDB?

2024-04-24 19:09:28 发布

您现在位置:Python中文网/ 问答频道 /正文

下面是我的代码。成功登录后,我无法刮取IMDB。问题是after\ u login验证表单请求是否有效,但当我通过发出新请求打印登录后页面的内容时,它会显示主IMDB页面,而不是用户登录的主IMDB页面。你知道吗

"""
Attributes:
    name (str): essential attribute which specifies the name of the spider
    start_urls (list): the urls that are to be scraped

"""
name = 'IMDB_spider'
start_urls = ['https://www.imdb.com/ap/signin?clientContext=131-8656718-8097200&openid.pape.max_auth_age=0&openid.'
            'return_to=https%3A%2F%2Fwww.imdb.com%2Fap-signin-handler&openid.identity=http%3A%2F%2Fspecs.openid.ne'
            't%2Fauth%2F2.0%2Fidentifier_select&openid.assoc_handle=imdb_us&openid.mode=checkid_setup&siteState=ey'
            'JvcGVuaWQuYXNzb2NfaGFuZGxlIjoiaW1kYl91cyIsInJlZGlyZWN0VG8iOiJodHRwczovL3d3dy5pbWRiLmNvbS8_cmVmXz1sb2d'
            'pbiJ9&openid.claimed_id=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.ns=http'
            '%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0&&tag=imdbtag_reg-20']

def parse(self, response):
    """
    Scrapy's default method that handles all the downloaded response for
    each request made.

    Arguments:
        response (text): contains all data of the page and other helpful
        methods as well

    """
    return scrapy.FormRequest.from_response(
        response,
        formdata={'username': '*******', 'password': '****'},
        callback=self.after_login
    )

def after_login(self, response):
    """
    Default callback method that is called to authenticate when logging in
    to website.

    Arguments:
        response (text): contains all data of the page and other helpful
        methods as well

    """
    if "There was a problem." in response.body:
        print('Login Failed')
        return
    print('Login Success')
    return scrapy.Request(url="http://www.imdb.com",
                          callback=self.parse_imdb_page)

def parse_imdb_page(self, response):
    print response.body

请帮忙


Tags: ofthetonameselfreturnresponsepage