使用Scrapy + Splash的表单请求

2024-03-28 08:30:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试使用以下代码登录到一个网站(对此文章稍作修改):

import scrapy
from scrapy_splash import SplashRequest
from scrapy.crawler import CrawlerProcess

class Login_me(scrapy.Spider):
    name = 'espn'
    allowed_domains = ['games.espn.com']
    start_urls = ['http://games.espn.com/ffl/leaguerosters?leagueId=774630']

    def start_requests(self):
        script = """
        function main(splash)
                local url = splash.args.url

                assert(splash:go(url))
                assert(splash:wait(10))

                local search_input = splash:select('input[type=email]')   
                search_input:send_text("user email")

                local search_input = splash:select('input[type=password]')
                search_input:send_text("user password!")

                assert(splash:wait(10))
                local submit_button = splash:select('input[type=submit]')
                submit_button:click()

                assert(splash:wait(10))

                return html = splash:html()
              end
            """

        yield SplashRequest(
            'http://games.espn.com/ffl/leaguerosters?leagueId=774630',
            callback=self.after_login,
            endpoint='execute',
            args={'lua_source': script}
            )
        def after_login(self, response):
            table = response.xpath('//table[@id="playertable_0"]')
            for player in table.css('tr[id]'):
                 item = {
                         'id': player.css('::attr(id)').extract_first(),
                        }    
                 yield item
            print(item)

我得到了一个错误:

^{pr2}$

由于某些原因,我仍然无法登录。我在这里跳了很多不同的帖子,也尝试了很多不同的变体”启动:选择“,但我似乎找不到我的问题。当我用chrome查看网页时,我看到了这个(密码有类似的html):

 <input type="email" placeholder="Username or Email Address" autocapitalize="none" autocomplete="on" autocorrect="off" spellcheck="false" ng-model="vm.username" 
ng-pattern="/^[^<&quot;>]*$/" ng-required="true" did-disable-validate="" ng-focus="vm.resetUsername()" class="ng-pristine ng-invalid ng-invalid-required 
ng-valid-pattern ng-touched" tabindex="0" required="required" aria-required="true" aria-invalid="true">

上面的html,我相信是用JS编写的。因此,我无法使用Scrapy获取它,因此,我查看了页面的源代码,我认为Splash使用的相关JS代码如下(但不确定):

function authenticate(params) {
        return makeRequest('POST', '/guest/login', {
            'loginValue': params.loginValue,
            'password': params.password
        }, {
            'Authorization': params.authorization,
            'correlation-id': params.correlationId,
            'conversation-id': params.conversationId,
            'oneid-reporting': buildReportingHeader(params.reporting)
        }, {
            'langPref': getLangPref()
        });
    }

有人能把我推到正确的方向吗?在


Tags: importidinputsearchlocalhtmltyperequired
1条回答
网友
1楼 · 发布于 2024-03-28 08:30:33

这里的主要问题是登录表单在iframe元素中。 我不知道刮花,所以下面POC代码用硒和靓汤。但其机制与splash类似,您需要切换到iframe,然后在id消失时返回。在

import os
from bs4 import BeautifulSoup
from selenium import webdriver

USER = 'theUser'
PASS = 'thePassword'

fp = webdriver.FirefoxProfile()
driver = webdriver.Firefox(fp)
driver.get('http://games.espn.com/ffl/leaguerosters?leagueId=774630')

iframe = driver.find_element_by_css_selector('iframe#disneyid-iframe')
driver.switch_to.frame(iframe)
driver.find_element_by_css_selector("input[type='email']").send_keys(USER)
driver.find_element_by_css_selector("input[type='password']").send_keys(PASS)
driver.find_element_by_css_selector("button[type='submit']").click()

driver.switch_to.default_content()
soup_level1 = BeautifulSoup(driver.page_source, 'html.parser')

为了让这段代码正常工作,您需要安装firefox和geckodriver,并在路径中,以及兼容的版本。在

相关问题 更多 >