Scrapy - 同时将日志记录到文件和标准输出，并带有爬虫名称

20 投票

8 回答

15588 浏览

提问于 2025-04-17 08:29

我决定使用Python的日志模块，因为Twisted在标准错误输出的消息太长了。我想把像StatsCollector生成的有意义的INFO级别消息写到一个单独的日志文件里，同时还保留屏幕上的消息。

 from twisted.python import log
     import logging
     logging.basicConfig(level=logging.INFO, filemode='w', filename='buyerlog.txt')
     observer = log.PythonLoggingObserver()
     observer.start()

这样做没问题，我得到了我的消息，但缺点是我不知道这些消息是哪个爬虫生成的！这是我的日志文件，其中“twisted”是通过%(name)s显示的：

 INFO:twisted:Log opened.
  2 INFO:twisted:Scrapy 0.12.0.2543 started (bot: property)
  3 INFO:twisted:scrapy.telnet.TelnetConsole starting on 6023
  4 INFO:twisted:scrapy.webservice.WebService starting on 6080
  5 INFO:twisted:Spider opened
  6 INFO:twisted:Spider opened
  7 INFO:twisted:Received SIGINT, shutting down gracefully. Send again to force unclean shutdown
  8 INFO:twisted:Closing spider (shutdown)
  9 INFO:twisted:Closing spider (shutdown)
 10 INFO:twisted:Dumping spider stats:
 11 {'downloader/exception_count': 3,
 12  'downloader/exception_type_count/scrapy.exceptions.IgnoreRequest': 3,
 13  'downloader/request_bytes': 9973,

与从Twisted在标准错误输出生成的消息相比：

2011-12-16 17:34:56+0800 [expats] DEBUG: number of rules: 4
2011-12-16 17:34:56+0800 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2011-12-16 17:34:56+0800 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2011-12-16 17:34:56+0800 [iproperty] INFO: Spider opened
2011-12-16 17:34:56+0800 [iproperty] DEBUG: Redirecting (301) to <GET http://www.iproperty.com.sg/> from <GET http://iproperty.com.sg>
2011-12-16 17:34:57+0800 [iproperty] DEBUG: Crawled (200) <

我尝试过%(name)s、%(module)s等，但似乎无法显示爬虫的名字。有人知道怎么解决这个问题吗？

编辑：使用设置中的LOG_FILE和LOG_LEVEL的问题是，较低级别的消息不会在标准错误输出中显示。

错误处理标准输出日志记录日志文件 twisted scrapy 爬虫消息级别

8 个回答

对于那些在阅读当前文档版本之前就来到这里的人：

import logging
from scrapy.utils.log import configure_logging

configure_logging(install_root_handler=False)
logging.basicConfig(
    filename='log.txt',
    filemode = 'a',
    format='%(levelname)s: %(message)s',
    level=logging.DEBUG
)

回答于 2025-04-17 由 Python大师

分享举报

使用下面的命令来重定向输出非常简单：scrapy some-scrapy's-args 2>&1 | tee -a logname

这样，scrapy 输出的所有内容，包括正常信息和错误信息，都会被保存到一个叫 logname 的文件里，同时也会显示在屏幕上。

回答于 2025-04-17 由 Python大师

分享举报

你想使用ScrapyFileLogObserver这个东西。

import logging
from scrapy.log import ScrapyFileLogObserver

logfile = open('testlog.log', 'w')
log_observer = ScrapyFileLogObserver(logfile, level=logging.DEBUG)
log_observer.start()

我很高兴你问了这个问题，我自己也一直想这么做。

回答于 2025-04-17 由 Python大师

分享举报

Scrapy - 同时将日志记录到文件和标准输出，并带有爬虫名称

8 个回答

撰写回答