为scrapydweb设计的一种周期性和增量地解析scrapy日志文件的工具。
logparser的Python项目详细描述
logparser:一个周期性和增量地解析残片日志文件的工具,为ScrapydWeb设计。
安装
- 使用pip:
pip install logparser
请注意,您可能需要首先执行python -m pip install --upgrade pip
以获取最新版本的logparser,或者从https://pypi.org/project/logparser/#files下载tar.gz文件并通过pip install logparser-x.x.x.tar.gz
安装它
- 使用git:
pip install --upgrade git+https://github.com/my8100/logparser.git
或:
git clone https://github.com/my8100/logparser.git
cd logparser
python setup.py install
用法
在python中使用
<详细信息>In[1]:fromlogparserimportparseIn[2]:log="""2018-10-23 18:28:34 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: demo) ...: 2018-10-23 18:29:41 [scrapy.statscollectors] INFO: Dumping Scrapy stats: ...: {'downloader/exception_count': 3, ...: 'downloader/exception_type_count/twisted.internet.error.TCPTimedOutError': 3, ...: 'downloader/request_bytes': 1336, ...: 'downloader/request_count': 7, ...: 'downloader/request_method_count/GET': 7, ...: 'downloader/response_bytes': 1669, ...: 'downloader/response_count': 4, ...: 'downloader/response_status_count/200': 2, ...: 'downloader/response_status_count/302': 1, ...: 'downloader/response_status_count/404': 1, ...: 'dupefilter/filtered': 1, ...: 'finish_reason': 'finished', ...: 'finish_time': datetime.datetime(2018, 10, 23, 10, 29, 41, 174719), ...: 'httperror/response_ignored_count': 1, ...: 'httperror/response_ignored_status_count/404': 1, ...: 'item_scraped_count': 2, ...: 'log_count/CRITICAL': 5, ...: 'log_count/DEBUG': 14, ...: 'log_count/ERROR': 5, ...: 'log_count/INFO': 75, ...: 'log_count/WARNING': 3, ...: 'offsite/domains': 1, ...: 'offsite/filtered': 1, ...: 'request_depth_max': 1, ...: 'response_received_count': 3, ...: 'retry/count': 2, ...: 'retry/max_reached': 1, ...: 'retry/reason_count/twisted.internet.error.TCPTimedOutError': 2, ...: 'scheduler/dequeued': 7, ...: 'scheduler/dequeued/memory': 7, ...: 'scheduler/enqueued': 7, ...: 'scheduler/enqueued/memory': 7, ...: 'start_time': datetime.datetime(2018, 10, 23, 10, 28, 35, 70938)} ...: 2018-10-23 18:29:42 [scrapy.core.engine] INFO: Spider closed (finished)"""In[3]:odict=parse(log,headlines=1,taillines=1)In[4]:odictOut[4]:OrderedDict([('head','2018-10-23 18:28:34 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: demo)'),('tail','2018-10-23 18:29:42 [scrapy.core.engine] INFO: Spider closed (finished)'),('first_log_time','2018-10-23 18:28:34'),('latest_log_time','2018-10-23 18:29:42'),('runtime','0:01:08'),('first_log_timestamp',1540290514),('latest_log_timestamp',1540290582),('datas',[]),('pages',3),('items',2),('latest_matches',{'telnet_console':'','resuming_crawl':'','latest_offsite':'','latest_duplicate':'','latest_crawl':'','latest_scrape':'','latest_item':'','latest_stat':''}),('latest_crawl_timestamp',0),('latest_scrape_timestamp',0),('log_categories',{'critical_logs':{'count':5,'details':[]},'error_logs':{'count':5,'details':[]},'warning_logs':{'count':3,'details':[]},'redirect_logs':{'count':1,'details':[]},'retry_logs':{'count':2,'details':[]},'ignore_logs':{'count':1,'details':[]}}),('shutdown_reason','N/A'),('finish_reason','finished'),('crawler_stats',OrderedDict([('source','log'),('last_update_time','2018-10-23 18:29:41'),('last_update_timestamp',1540290581),('downloader/exception_count',3),('downloader/exception_type_count/twisted.internet.error.TCPTimedOutError',3),('downloader/request_bytes',1336),('downloader/request_count',7),('downloader/request_method_count/GET',7),('downloader/response_bytes',1669),('downloader/response_count',4),('downloader/response_status_count/200',2),('downloader/response_status_count/302',1),('downloader/response_status_count/404',1),('dupefilter/filtered',1),('finish_reason','finished'),('finish_time','datetime.datetime(2018, 10, 23, 10, 29, 41, 174719)'),('httperror/response_ignored_count',1),('httperror/response_ignored_status_count/404',1),('item_scraped_count',2),('log_count/CRITICAL',5),('log_count/DEBUG',14),('log_count/ERROR',5),('log_count/INFO',75),('log_count/WARNING',3),('offsite/domains',1),('offsite/filtered',1),('request_depth_max',1),('response_received_count',3),('retry/count',2),('retry/max_reached',1),('retry/reason_count/twisted.internet.error.TCPTimedOutError',2),('scheduler/dequeued',7),('scheduler/dequeued/memory',7),('scheduler/enqueued',7),('scheduler/enqueued/memory',7),('start_time','datetime.datetime(2018, 10, 23, 10, 28, 35, 70938)')])),('last_update_time','2019-03-08 16:53:50'),('last_update_timestamp',1552035230),('logparser_version','0.8.1')])In[5]:odict['runtime']Out[5]:'0:01:08'In[6]:odict['pages']Out[6]:3In[7]:odict['items']Out[7]:2In[8]:odict['finish_reason']Out[8]:'finished'详细信息>
作为服务运行
- 确保Scrapyd已在当前主机上安装并启动。
- 通过命令
logparser
- 访问http://127.0.0.1:6800/logs/stats.json(假设scrapyd服务在端口6800上运行。)
- 访问http://127.0.0.1:6800/logs/projectname/spidername/jobid.json以获取作业的详细统计信息。
使用scrapydweb进行可视化
查看https://github.com/my8100/scrapydweb了解更多信息。