Scrapy spider在本地工作,但在scrapinghub上不起作用

2024-03-29 14:41:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我用一个小蜘蛛向一个站点发送两个POST请求。 在我的本地计算机上,它工作,我得到2个响应。你知道吗

但在Scrapy Cloud(scrapinghub.com)上,每个请求都会出现相同的错误:

[scrapy.core.scraper] Error downloading https://baca.ii.uj.edu.pl/p12018/testerka_gwt/problems>: twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'SSL23_GET_SERVER_HELLO', 'tlsv1 alert internal error')]>]

该网站的证书已过期,我认为这是造成问题的原因。但是,剪贴文件说,网站证书是默认不检查,所以我不知道。你知道吗

我的蜘蛛代码:

class resultsTest(scrapy.Spider):
    name = "results"

    custom_settings = {
        'ROBOTSTXT_OBEY': False,
    }

    def start_requests(self):
        firstAss = 4 #first assignment
        lastAss = 5 #last assignment
        url = 'https://baca.ii.uj.edu.pl/p12018/testerka_gwt/problems'
        bodyBeginning = '7|0|5|https://baca.ii.uj.edu.pl/p12018/testerka_gwt/|548F7E6329FFDEC9688CE48426651141|testerka.gwt.client.problems.ProblemsService|getProblemStatistic|I|1|2|3|4|1|5|'
        headers = {
                "Content-Type": "text/x-gwt-rpc; charset=UTF-8",
                "X-GWT-Module-Base": "https://baca.ii.uj.edu.pl/p12018/testerka_gwt/",
                "X-GWT-Permutation": "5A4AE95C27260DF45F17F9BF027335F6",
                }

        for num in range(firstAss, lastAss + 1):
            body = bodyBeginning + str(num) + "|" 
            yield scrapy.Request(
                    url,
                    method = "POST",
                    headers = headers,
                    body = body)

    def parse(self, response):
        yield response.body_as_unicode()

Tags: httpsbodypostheadersiiscrapypl蜘蛛