Scrapy/BigQuery在关闭spider时失败，并发送以下错误：OSError:[Errno 5]Input/Outpu

2019-06-11 23:18:12 [scrapy.extensions.logstats] INFO: Crawled 480107 pages (at 787 pages/min), scraped 466560 items (at 772 items/min) 2019-06-11 23:18:33 [scrapy.core.engine] INFO: Closing spider (finished) 2019-06-11 23:18:33 [scrapy.core.engine] ERROR: Scraper close failure Traceback (most recent call last): File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/home/togayyazar/etsy/etsy/pipelines.py", line 20, in close_spider self.write_to_bq() File "/home/togayyazar/etsy/etsy/pipelines.py", line 30, in write_to_bq print("-----BIGQUERY-----") OSError: [Errno 5] Input/output error 2019-06-11 23:18:33 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 217195256, 'downloader/request_count': 480652, 'downloader/request_method_count/GET': 480652, 'downloader/response_bytes': 29983627714, 'downloader/response_count': 480652, 'downloader/response_status_count/200': 480373, 'downloader/response_status_count/301': 254, 'downloader/response_status_count/400': 6, 'downloader/response_status_count/503': 19, 'dupefilter/filtered': 358230, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2019, 6, 11, 23, 18, 33, 739888), 'httperror/response_ignored_count': 6, 'httperror/response_ignored_status_count/400': 6, 'item_scraped_count': 466833, 'log_count/ERROR': 1, 'log_count/INFO': 663, 'memusage/max': 456044544, 'memusage/startup': 61976576, 'request_depth_max': 88, 'response_received_count': 480379, 'retry/count': 19, 'retry/reason_count/503 Service Unavailable': 19, 'scheduler/dequeued': 480652, 'scheduler/dequeued/memory': 480652, 'scheduler/enqueued': 480652, 'scheduler/enqueued/memory': 480652, 'start_time': datetime.datetime(2019, 6, 11, 12, 30, 12, 400853)} 2019-06-11 23:18:33 [scrapy.core.engine] INFO: Spider closed (finished)

1条回答

网友

1楼 · 发布于 2024-05-29 02:29:00

如果您查看错误跟踪，您将看到print()函数中出现异常。在

File "/home/togayyazar/etsy/etsy/pipelines.py", line 30, in write_to_bq
    print("  -BIGQUERY  -") OSError: [Errno 5] Input/output error

检查this thread以了解问题。在

我建议您只需删除print或将其替换为logging模块，如果您想使用，spider有一个属性logger，但是如果您希望有一个具有管道名称的记录器，可以这样做：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章