Scrapy with splash设置在Scrapy shell中工作,否则将失败

2024-06-16 11:44:09 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从macOS上的link中提取内容,使用scrapyscrapy_splash设置和BeautifulSoup我按照documentation中的说明操作

  • 我在scrapy shell中测试了每个命令,每个命令都工作得非常好,在几页上进行了测试。当我使用相同的命令运行spider时,它无法检测到任何项目

settings.py

BOT_NAME = 'stepstone'
SPIDER_MODULES = ['stepstone.spiders']
NEWSPIDER_MODULE = 'stepstone.spiders'
SPLASH_URL = 'http://0.0.0.0:8050'  # changed from the documentation's http://192.168.59.103:8050 which does not work 
DOWNLOADER_MIDDLEWARES = {
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}
SPIDER_MIDDLEWARES = {
    'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'

星形模块:

from scrapy.spiders import Spider
from scrapy_splash import SplashRequest
from scrapy import Request
from bs4 import BeautifulSoup


class StepSpider(Spider):
    name = 'step'
    allowed_domains = ['www.stepstone.de']
    start_urls = [
        'https://www.stepstone.de/5/ergebnisliste.html?stf=freeText&ns=1&qs='
        '%5B%7B%22id%22%3A%22216805%22%2C%22description%22%3A%22Software-Entw'
        'ickler%2Fin%22%2C%22type%22%3A%22jd%22%7D%2C%7B%22id%22%3A%223000001'
        '15%22%2C%22description%22%3A%22Deutschland%22%2C%22type%22%3A%22geoc'
        'ity%22%7D%5D&companyID=0&cityID=300000115&sourceOfTheSearchField=home'
        'pagemex%3Ageneral&searchOrigin=Homepage_top-search&ke=Software-Entwic'
        'kler%2Fin&ws=Deutschland&ra=30/'
    ]

    @staticmethod
    def extract_item(soup, extraction_path):
        result = soup.find(*extraction_path)
        if result:
            return result.getText()

    def parse(self, response):
        soup = BeautifulSoup(response.body, features='lxml')
        listings = [
            response.urljoin(item)
            for item in response.xpath('//div/div/a/@href').extract()
            if 'stellenangebote' in item
        ]
        yield from [
            Request(
                url,
                callback=self.parse_item,
                cb_kwargs={'soup': soup},
                meta={'splash': {'args': {'html': 1, 'png': 1,}}},
            )
            for url in listings
        ]
        next_page = soup.find('a', {'data-at': 'pagination-next'})
        if next_page:
            yield SplashRequest(next_page.get('href'), self.parse)

    def parse_header(self, response, soup):
        title = response.xpath('//h1/text()').get()
        location = self.extract_item(
            soup, ('li', {'class': 'at-listing__li: st-icons_location'})
        )
        contract_type = self.extract_item(
            soup, ('li', {'class': 'at-listing__list-icons_contract-type'})
        )
        work_type = self.extract_item(
            soup, ('li', {'class': 'at-listing__list-icons_work-type'})
        )
        return {
            'title': title,
            'location': location,
            'contract_type': contract_type,
            'work_type': work_type,
        }

    def parse_body(self, response, soup):
        titles = response.xpath('//h4/text()').extract()
        intro = self.extract_item(
            soup, ('div', {'class': 'at-section-text-introduction-content'})
        )
        description = self.extract_item(
            soup, ('div', {'class': 'at-section-text-description-content'})
        )
        profile = self.extract_item(
            soup, ('div', {'class': 'at-section-text-profile-content'})
        )
        we_offer = self.extract_item(
            soup, ('div', {'class': 'at-section-text-weoffer-content'})
        )
        contact = self.extract_item(
            soup, ('div', {'class': 'at-section-text-contact-content'})
        )
        return {
            title: text
            for title, text in zip(
                titles, [intro, description, profile, we_offer, contact]
            )
        }

    def parse_item(self, response, soup):
        items = self.parse_header(response, soup)
        items.update(self.parse_body(response, soup))
        yield items

完整日志:

2020-08-11 17:57:44 [scrapy.utils.log] INFO: Scrapy 2.2.1 started (bot: stepstone)
2020-08-11 17:57:44 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.8.3 (default, May 27 2020, 20:54:22) - [Clang 11.0.3 (clang-1103.0.32.59)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1g  21 Apr 2020), cryptography 3.0, Platform macOS-10.15.6-x86_64-i386-64bit
2020-08-11 17:57:44 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2020-08-11 17:57:44 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'stepstone',
 'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter',
 'HTTPCACHE_STORAGE': 'scrapy_splash.SplashAwareFSCacheStorage',
 'NEWSPIDER_MODULE': 'stepstone.spiders',
 'SPIDER_MODULES': ['stepstone.spiders']}
2020-08-11 17:57:44 [scrapy.extensions.telnet] INFO: Telnet Password: 71c7bd3bdaf32c63
2020-08-11 17:57:44 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats']
2020-08-11 17:57:44 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy_splash.SplashCookiesMiddleware',
 'scrapy_splash.SplashMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2020-08-11 17:57:44 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy_splash.SplashDeduplicateArgsMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2020-08-11 17:57:44 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2020-08-11 17:57:44 [scrapy.core.engine] INFO: Spider opened
2020-08-11 17:57:44 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-08-11 17:57:44 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-08-11 17:57:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.stepstone.de/5/ergebnisliste.html?stf=freeText&ns=1&qs=%5B%7B%22id%22%3A%22216805%22%2C%22description%22%3A%22Software-Entwickler%2Fin%22%2C%22type%22%3A%22jd%22%7D%2C%7B%22id%22%3A%22300000115%22%2C%22description%22%3A%22Deutschland%22%2C%22type%22%3A%22geocity%22%7D%5D&companyID=0&cityID=300000115&sourceOfTheSearchField=homepagemex%3Ageneral&searchOrigin=Homepage_top-search&ke=Software-Entwickler%2Fin&ws=Deutschland&ra=30/> (referer: None)
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: utf-8  confidence = 0.99
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: SHIFT_JIS Japanese confidence = 0.01
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: EUC-JP Japanese confidence = 0.01
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: GB2312 Chinese confidence = 0.01
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: EUC-KR Korean confidence = 0.01
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: CP949 Korean confidence = 0.01
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: Big5 Chinese confidence = 0.01
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: EUC-TW Taiwan confidence = 0.01
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: windows-1251 Russian confidence = 0.01
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: KOI8-R Russian confidence = 0.01
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: ISO-8859-5 Russian confidence = 0.0
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: MacCyrillic Russian confidence = 0.01
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: IBM866 Russian confidence = 0.01
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: IBM855 Russian confidence = 0.01
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: ISO-8859-7 Greek confidence = 0.0
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: windows-1253 Greek confidence = 0.0
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: ISO-8859-5 Bulgairan confidence = 0.0
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: windows-1251 Bulgarian confidence = 0.0
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: TIS-620 Thai confidence = 0.041278205445058724
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: ISO-8859-9 Turkish confidence = 0.5186494104315963
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: windows-1255 Hebrew confidence = 0.0
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: windows-1255 Hebrew confidence = 0.0
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: windows-1255 Hebrew confidence = 0.0
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: utf-8  confidence = 0.99
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: SHIFT_JIS Japanese confidence = 0.01
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: EUC-JP Japanese confidence = 0.01
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: GB2312 Chinese confidence = 0.01
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: EUC-KR Korean confidence = 0.01
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: CP949 Korean confidence = 0.01
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: Big5 Chinese confidence = 0.01
2020-08-11 17:57:46 [chardet.charsetprober] DEBUG: EUC-TW Taiwan confidence = 0.01
2020-08-11 17:57:47 [scrapy.dupefilters] DEBUG: Filtered duplicate request: <GET https://www.stepstone.de/stellenangebote--JAVA-Software-Entwickler-m-w-d-Sueddeutschland-TECCON-Consulting-Engineering-GmbH--6582908-inline.html?suid=90b7defb-2854-4c23-98bd-b39bc15a6922&rltr=1_1_25_dynrl_m_0_0_0_0> - no more duplicates will be shown (see DUPEFILTER_DEBUG to show all duplicates)
2020-08-11 17:57:47 [py.warnings] WARNING: /usr/local/lib/python3.8/site-packages/scrapy_splash/request.py:41: ScrapyDeprecationWarning: Call to deprecated function to_native_str. Use to_unicode instead.
  url = to_native_str(url)

2020-08-11 17:57:50 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://0.0.0.0:8050/render.json> (referer: None)
2020-08-11 17:57:51 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.stepstone.de/stellenangebote--Software-Entwickler-fuer-Windowsapplikationen-m-w-d-Stockach-oder-Boeblingen-Baumer-MDS-GmbH--6568164-inline.html?suid=90b7defb-2854-4c23-98bd-b39bc15a6922&rltr=19_19_25_dynrl_m_0_0_0_0>
{'title': 'Software Entwickler für Windowsapplikationen (m/w/d)', 'location': None, 'contract_type': None, 'work_type': None, 'Ihre Herausforderung:': None, 'Sie verfügen über:': None, 'Wir bieten:': None, 'Kontakt:': None}
2020-08-11 17:57:51 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://0.0.0.0:8050/render.json> (referer: None)
2020-08-11 17:57:51 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.stepstone.de/stellenangebote--JAVA-Software-Entwickler-m-w-d-Sueddeutschland-TECCON-Consulting-Engineering-GmbH--6582908-inline.html?suid=90b7defb-2854-4c23-98bd-b39bc15a6922&rltr=1_1_25_dynrl_m_0_0_0_0>
{'title': 'JAVA Software-Entwickler (m/w/d)', 'location': None, 'contract_type': None, 'work_type': None, 'Einleitung': None, 'Ihre Aufgaben': None, 'Ihr Profil': None, 'Wir bieten': None, 'Weitere Informationen': None}
2020-08-11 17:57:52 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://0.0.0.0:8050/render.json> (referer: None)
2020-08-11 17:57:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.stepstone.de/stellenangebote--Software-Entwickler-Business-Engineer-fuer-Blockchain-Team-in-Gruendung-w-m-d-Frankfurt-Main-Deutsche-Bahn-AG--6249570-inline.html?suid=90b7defb-2854-4c23-98bd-b39bc15a6922&rltr=16_16_25_dynrl_m_0_0_0_0>
{'title': 'Software-Entwickler / Business Engineer für Blockchain-Team in Gründung (w/m/d)', 'location': None, 'contract_type': None, 'work_type': None, 'Was dich erwartet': None, 'Was wir erwarten': None, 'Wir bieten': None, 'Standort': None}
2020-08-11 17:57:55 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://0.0.0.0:8050/render.json> (referer: None)
2020-08-11 17:57:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.stepstone.de/stellenangebote--Software-Entwickler-w-m-d-Diagnose-und-Visualisierungssysteme-Mannheim-Halle-Stadler-Mannheim-GmbH--6615613-inline.html?suid=90b7defb-2854-4c23-98bd-b39bc15a6922&rltr=13_13_25_dynrl_m_0_0_0_0>
{'title': 'Software-Entwickler (w/m/d) Diagnose und Visualisierungssysteme', 'location': None, 'contract_type': None, 'work_type': None, 'Ihre Aufgaben:': None, 'Ihr Profil:': None, 'Unser Angebot:': None, 'Begeistert?': None}
2020-08-11 17:57:55 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://0.0.0.0:8050/render.json> (referer: None)
2020-08-11 17:57:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.stepstone.de/stellenangebote--Software-Entwickler-m-w-d-Rosenheim-Agenda-Informationssysteme-GmbH-Co-KG--6590641-inline.html?suid=90b7defb-2854-4c23-98bd-b39bc15a6922&rltr=17_17_25_dynrl_m_0_0_0_0>
{'title': 'Software-Entwickler (m/w/d)', 'location': None, 'contract_type': None, 'work_type': None, 'Ihre Aufgaben:': None, 'Ihr Profil:': None, 'Das spricht für uns:': None, 'Kontakt:': None, 'Standort': None}
2020-08-11 17:58:08 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://0.0.0.0:8050/render.json> (referer: None)
2020-08-11 17:58:08 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.stepstone.de/stellenangebote--Software-Entwickler-w-m-d-fuer-Fahrzeugsteuerung-Mannheim-Halle-Stadler-Mannheim-GmbH--6615612-inline.html?suid=90b7defb-2854-4c23-98bd-b39bc15a6922&rltr=11_11_25_dynrl_m_0_0_0_0>
{'title': 'Software-Entwickler (w/m/d) für Fahrzeugsteuerung', 'location': None, 'contract_type': None, 'work_type': None, 'Ihre Aufgaben:': None, 'Ihr Profil:': None, 'Unser Angebot:': None, 'Begeistert?': None}
^C2020-08-11 17:58:09 [scrapy.crawler] INFO: Received SIGINT, shutting down gracefully. Send again to force 
2020-08-11 17:58:09 [scrapy.core.engine] INFO: Closing spider (shutdown)
2020-08-11 17:58:13 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://0.0.0.0:8050/render.json> (referer: None)
2020-08-11 17:58:13 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://0.0.0.0:8050/render.json> (referer: None)
2020-08-11 17:58:13 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://0.0.0.0:8050/render.json> (referer: None)
2020-08-11 17:58:13 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://0.0.0.0:8050/render.json> (referer: None)
2020-08-11 17:58:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.stepstone.de/stellenangebote--Software-Entwickler-m-w-d-Meissen-Staatliche-Porzellan-Manufaktur-Meissen-GmbH--6462761-inline.html?suid=90b7defb-2854-4c23-98bd-b39bc15a6922&rltr=14_14_25_dynrl_m_0_0_0_0>
{'title': 'Software Entwickler (m/w/d)', 'location': None, 'contract_type': None, 'work_type': None, 'Wir gehen neue Wege': None, 'Ihre Aufgaben': None, 'unsere Anforderungen': None, 'unser Angebot': None, 'Kontakt': None}
2020-08-11 17:58:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.stepstone.de/stellenangebote--Agiler-Software-Entwickler-m-w-div-Dresden-Otto-Group-Solution-Provider-OSP-GmbH--4573007-inline.html?suid=90b7defb-2854-4c23-98bd-b39bc15a6922&rltr=8_8_25_dynrl_m_0_0_0_0>
{'title': 'Agiler Software Entwickler (m/w/div)', 'location': None, 'contract_type': None, 'work_type': None, 'Über uns': None, 'Was dich erwartet': None, 'Was du mitbringen solltest': None, 'Diese und weitere Benefits erwarten dich': None, 'Kontakt': None}
2020-08-11 17:58:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.stepstone.de/stellenangebote--Software-Entwickler-m-w-d-Essen-Lowell-Group--6615697-inline.html?suid=90b7defb-2854-4c23-98bd-b39bc15a6922&rltr=9_9_25_dynrl_m_0_0_0_0>
{'title': 'Software Entwickler (m/w/d)', 'location': None, 'contract_type': None, 'work_type': None, 'Ihre Aufgaben': None, 'Ihr Profil': None, 'Wir bieten': None, 'Kontakt': None, 'Standort': None}
2020-08-11 17:58:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.stepstone.de/stellenangebote--Softwareentwickler-m-w-d-Fullstack-Web-Boeblingen-Braunschweig-Deutschlandweit-Ingolstadt-Muenchen-Norddeutschland-Stuttgart-umlaut--6122455-inline.html?suid=90b7defb-2854-4c23-98bd-b39bc15a6922&rltr=15_15_25_dynrl_m_0_0_0_0>
{'title': 'Softwareentwickler (m/w/d) - Fullstack Web', 'location': None, 'contract_type': None, 'work_type': None, 'our öffer': None, 'yöu': None, 'top 5 reasöns': None, 'cöntact': None, 'Mitarbeiterbewertungen': None}
2020-08-11 17:58:13 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://0.0.0.0:8050/render.json> (referer: None)
2020-08-11 17:58:13 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.stepstone.de/stellenangebote--Software-Entwickler-E-Commerce-m-w-d-Dresden-Otto-Group-Solution-Provider-OSP-GmbH--6550022-inline.html?suid=90b7defb-2854-4c23-98bd-b39bc15a6922&rltr=10_10_25_dynrl_m_0_0_0_0>
{'title': 'Software Entwickler E-Commerce (m/w/d)', 'location': None, 'contract_type': None, 'work_type': None, 'Was dich erwartet': None, 'Was du mitbringen solltest': None, 'Kontakt': None, 'Standort': None}
^C2020-08-11 17:58:16 [scrapy.crawler] INFO: Received SIGINT twice, forcing unclean shutdown
2020-08-11 17:58:16 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <POST http://0.0.0.0:8050/render.json> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2020-08-11 17:58:16 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <POST http://0.0.0.0:8050/render.json> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2020-08-11 17:58:16 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <POST http://0.0.0.0:8050/render.json> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2020-08-11 17:58:16 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.stepstone.de/5/ergebnisliste.html?stf=freeText&ns=1&companyid=0&sourceofthesearchfield=homepagemex%3Ageneral&qs=[{"id"%3A216805%2C"description"%3A"Software-Entwickler\%2Fin"%2C"type"%3A"jd"}%2C{"id"%3A300000115%2C"description"%3A"Deutschland"%2C"type"%3A"geocity"}]&cityid=300000115&ke=Software-Entwickler%2Fin&ws=Deutschland&ra=30&suid=90b7defb-2854-4c23-98bd-b39bc15a6922&of=25&action=paging_next via http://0.0.0.0:8050/render.html> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2020-08-11 17:58:16 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <POST http://0.0.0.0:8050/render.json> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2020-08-11 17:58:16 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <POST http://0.0.0.0:8050/render.json> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2020-08-11 17:58:16 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <POST http://0.0.0.0:8050/render.json> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2020-08-11 17:58:16 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <POST http://0.0.0.0:8050/render.json> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]

docker日志(由于文本限制,不是完整日志,但基本上相同的事情一直在重复):

2020-08-11 15:57:27+0000 [-] Log opened.
2020-08-11 15:57:27.990815 [-] Xvfb is started: ['Xvfb', ':2061643423', '-screen', '0', '1024x768x24', '-nolisten', 'tcp']
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-splash'
2020-08-11 15:57:28.135258 [-] Splash version: 3.4.1
2020-08-11 15:57:28.203198 [-] Qt 5.13.1, PyQt 5.13.1, WebKit 602.1, Chromium 73.0.3683.105, sip 4.19.19, Twisted 19.7.0, Lua 5.2
2020-08-11 15:57:28.203826 [-] Python 3.6.9 (default, Nov  7 2019, 10:44:02) [GCC 8.3.0]
2020-08-11 15:57:28.204679 [-] Open files limit: 1048576
2020-08-11 15:57:28.205242 [-] Can't bump open files limit
2020-08-11 15:57:28.229336 [-] proxy profiles support is enabled, proxy profiles path: /etc/splash/proxy-profiles
2020-08-11 15:57:28.229855 [-] memory cache: enabled, private mode: enabled, js cross-domain access: disabled
2020-08-11 15:57:28.410540 [-] verbosity=1, slots=20, argument_cache_max_entries=500, max-timeout=90.0
2020-08-11 15:57:28.411484 [-] Web UI: enabled, Lua: enabled (sandbox: enabled), Webkit: enabled, Chromium: enabled
2020-08-11 15:57:28.412634 [-] Site starting on 8050
2020-08-11 15:57:28.412924 [-] Starting factory <twisted.web.server.Site object at 0x7fbfa77591d0>
2020-08-11 15:57:28.414172 [-] Server listening on http://0.0.0.0:8050
2020-08-11 15:57:49.583386 [events] {"path": "/render.json", "rendertime": 2.339588165283203, "maxrss": 236848, "load": [0.1, 0.05, 0.06], "fds": 102, "active": 7, "qsize": 0, "_id": 140461124347104, "method": "POST", "timestamp": 1597161469, "user-agent": "Scrapy/2.2.1 (+https://scrapy.org)", "args": {"headers": {"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language": "en", "Cookie": "cfid=bef36179-b81a-44e3-9bd1-059f5406a911; cftoken=0; USER_HASH_ID=f896d04a-5348-455b-a0ab-ffd0d5f6e674; V5=1; UXUSER=BLACKLIST%3BA%3B%20%3B; STEPSTONEV5LANG=de; ONLINE_CF=14-190; dtCookie=35$77973CDF4397BDD4A3EF1CAFDB05C9FD", "Referer": "https://www.stepstone.de/5/ergebnisliste.html?stf=freeText&ns=1&qs=%5B%7B%22id%22%3A%22216805%22%2C%22description%22%3A%22Software-Entwickler%2Fin%22%2C%22type%22%3A%22jd%22%7D%2C%7B%22id%22%3A%22300000115%22%2C%22description%22%3A%22Deutschland%22%2C%22type%22%3A%22geocity%22%7D%5D&companyID=0&cityID=300000115&sourceOfTheSearchField=homepagemex%3Ageneral&searchOrigin=Homepage_top-search&ke=Software-Entwickler%2Fin&ws=Deutschland&ra=30/", "User-Agent": "Scrapy/2.2.1 (+https://scrapy.org)"}, "html": 1, "png": 1, "url": "https://www.stepstone.de/stellenangebote--Software-Entwickler-fuer-Windowsapplikationen-m-w-d-Stockach-oder-Boeblingen-Baumer-MDS-GmbH--6568164-inline.html?suid=90b7defb-2854-4c23-98bd-b39bc15a6922&rltr=19_19_25_dynrl_m_0_0_0_0", "uid": 140461124347104}, "status_code": 200, "client_ip": "172.17.0.1"}
2020-08-11 15:57:49.584498 [-] "172.17.0.1" - - [11/Aug/2020:15:57:48 +0000] "POST /render.json HTTP/1.1" 200 371319 "-" "Scrapy/2.2.1 (+https://scrapy.org)"
2020-08-11 15:57:49.777352 [events] {"path": "/render.json", "rendertime": 2.6071407794952393, "maxrss": 243100, "load": [0.1, 0.05, 0.06], "fds": 106, "active": 6, "qsize": 0, "_id": 140461124981984, "method": "POST", "timestamp": 1597161469, "user-agent": "Scrapy/2.2.1 (+https://scrapy.org)", "args": {"headers": {"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language": "en", "Cookie": "cfid=bef36179-b81a-44e3-9bd1-059f5406a911; cftoken=0; USER_HASH_ID=f896d04a-5348-455b-a0ab-ffd0d5f6e674; V5=1; UXUSER=BLACKLIST%3BA%3B%20%3B; STEPSTONEV5LANG=de; ONLINE_CF=14-190; dtCookie=35$77973CDF4397BDD4A3EF1CAFDB05C9FD", "Referer": "https://www.stepstone.de/5/ergebnisliste.html?stf=freeText&ns=1&qs=%5B%7B%22id%22%3A%22216805%22%2C%22description%22%3A%22Software-Entwickler%2Fin%22%2C%22type%22%3A%22jd%22%7D%2C%7B%22id%22%3A%22300000115%22%2C%22description%22%3A%22Deutschland%22%2C%22type%22%3A%22geocity%22%7D%5D&companyID=0&cityID=300000115&sourceOfTheSearchField=homepagemex%3Ageneral&searchOrigin=Homepage_top-search&ke=Software-Entwickler%2Fin&ws=Deutschland&ra=30/", "User-Agent": "Scrapy/2.2.1 (+https://scrapy.org)"}, "html": 1, "png": 1, "url": "https://www.stepstone.de/stellenangebote--JAVA-Software-Entwickler-m-w-d-Sueddeutschland-TECCON-Consulting-Engineering-GmbH--6582908-inline.html?suid=90b7defb-2854-4c23-98bd-b39bc15a6922&rltr=1_1_25_dynrl_m_0_0_0_0", "uid": 140461124981984}, "status_code": 200, "client_ip": "172.17.0.1"}

Tags: httpscoredebugnonehttphtmltypede