为scrapydweb设计的一种周期性和增量地解析scrapy日志文件的工具。

logparser的Python项目详细描述


logparser:一个周期性和增量地解析残片日志文件的工具,为ScrapydWeb设计。

PyPI - logparser VersionPyPI - Python VersionCircleCIcodecovCoverage StatusDownloads - totalGitHub license

安装

  • 使用pip:
pip install logparser

请注意,您可能需要首先执行python -m pip install --upgrade pip以获取最新版本的logparser,或者从https://pypi.org/project/logparser/#files下载tar.gz文件并通过pip install logparser-x.x.x.tar.gz安装它

  • 使用git:
pip install --upgrade git+https://github.com/my8100/logparser.git

或:

git clone https://github.com/my8100/logparser.git
cd logparser
python setup.py install

用法

在python中使用

<详细信息>查看代码
In[1]:fromlogparserimportparseIn[2]:log="""2018-10-23 18:28:34 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: demo)   ...: 2018-10-23 18:29:41 [scrapy.statscollectors] INFO: Dumping Scrapy stats:   ...: {'downloader/exception_count': 3,   ...:  'downloader/exception_type_count/twisted.internet.error.TCPTimedOutError': 3,   ...:  'downloader/request_bytes': 1336,   ...:  'downloader/request_count': 7,   ...:  'downloader/request_method_count/GET': 7,   ...:  'downloader/response_bytes': 1669,   ...:  'downloader/response_count': 4,   ...:  'downloader/response_status_count/200': 2,   ...:  'downloader/response_status_count/302': 1,   ...:  'downloader/response_status_count/404': 1,   ...:  'dupefilter/filtered': 1,   ...:  'finish_reason': 'finished',   ...:  'finish_time': datetime.datetime(2018, 10, 23, 10, 29, 41, 174719),   ...:  'httperror/response_ignored_count': 1,   ...:  'httperror/response_ignored_status_count/404': 1,   ...:  'item_scraped_count': 2,   ...:  'log_count/CRITICAL': 5,   ...:  'log_count/DEBUG': 14,   ...:  'log_count/ERROR': 5,   ...:  'log_count/INFO': 75,   ...:  'log_count/WARNING': 3,   ...:  'offsite/domains': 1,   ...:  'offsite/filtered': 1,   ...:  'request_depth_max': 1,   ...:  'response_received_count': 3,   ...:  'retry/count': 2,   ...:  'retry/max_reached': 1,   ...:  'retry/reason_count/twisted.internet.error.TCPTimedOutError': 2,   ...:  'scheduler/dequeued': 7,   ...:  'scheduler/dequeued/memory': 7,   ...:  'scheduler/enqueued': 7,   ...:  'scheduler/enqueued/memory': 7,   ...:  'start_time': datetime.datetime(2018, 10, 23, 10, 28, 35, 70938)}   ...: 2018-10-23 18:29:42 [scrapy.core.engine] INFO: Spider closed (finished)"""In[3]:odict=parse(log,headlines=1,taillines=1)In[4]:odictOut[4]:OrderedDict([('head','2018-10-23 18:28:34 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: demo)'),('tail','2018-10-23 18:29:42 [scrapy.core.engine] INFO: Spider closed (finished)'),('first_log_time','2018-10-23 18:28:34'),('latest_log_time','2018-10-23 18:29:42'),('runtime','0:01:08'),('first_log_timestamp',1540290514),('latest_log_timestamp',1540290582),('datas',[]),('pages',3),('items',2),('latest_matches',{'telnet_console':'','resuming_crawl':'','latest_offsite':'','latest_duplicate':'','latest_crawl':'','latest_scrape':'','latest_item':'','latest_stat':''}),('latest_crawl_timestamp',0),('latest_scrape_timestamp',0),('log_categories',{'critical_logs':{'count':5,'details':[]},'error_logs':{'count':5,'details':[]},'warning_logs':{'count':3,'details':[]},'redirect_logs':{'count':1,'details':[]},'retry_logs':{'count':2,'details':[]},'ignore_logs':{'count':1,'details':[]}}),('shutdown_reason','N/A'),('finish_reason','finished'),('crawler_stats',OrderedDict([('source','log'),('last_update_time','2018-10-23 18:29:41'),('last_update_timestamp',1540290581),('downloader/exception_count',3),('downloader/exception_type_count/twisted.internet.error.TCPTimedOutError',3),('downloader/request_bytes',1336),('downloader/request_count',7),('downloader/request_method_count/GET',7),('downloader/response_bytes',1669),('downloader/response_count',4),('downloader/response_status_count/200',2),('downloader/response_status_count/302',1),('downloader/response_status_count/404',1),('dupefilter/filtered',1),('finish_reason','finished'),('finish_time','datetime.datetime(2018, 10, 23, 10, 29, 41, 174719)'),('httperror/response_ignored_count',1),('httperror/response_ignored_status_count/404',1),('item_scraped_count',2),('log_count/CRITICAL',5),('log_count/DEBUG',14),('log_count/ERROR',5),('log_count/INFO',75),('log_count/WARNING',3),('offsite/domains',1),('offsite/filtered',1),('request_depth_max',1),('response_received_count',3),('retry/count',2),('retry/max_reached',1),('retry/reason_count/twisted.internet.error.TCPTimedOutError',2),('scheduler/dequeued',7),('scheduler/dequeued/memory',7),('scheduler/enqueued',7),('scheduler/enqueued/memory',7),('start_time','datetime.datetime(2018, 10, 23, 10, 28, 35, 70938)')])),('last_update_time','2019-03-08 16:53:50'),('last_update_timestamp',1552035230),('logparser_version','0.8.1')])In[5]:odict['runtime']Out[5]:'0:01:08'In[6]:odict['pages']Out[6]:3In[7]:odict['items']Out[7]:2In[8]:odict['finish_reason']Out[8]:'finished'

作为服务运行

  1. 确保Scrapyd已在当前主机上安装并启动。
  2. 通过命令logparser
  3. 访问http://127.0.0.1:6800/logs/stats.json(假设scrapyd服务在端口6800上运行。)
  4. 访问http://127.0.0.1:6800/logs/projectname/spidername/jobid.json以获取作业的详细统计信息。

使用scrapydweb进行可视化

查看https://github.com/my8100/scrapydweb了解更多信息。

stats

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
日期和现在之间的Java时间   java以适当的方式更新jLabel和jTextField   java如何从PQ(单链表)中删除最大值   java可以通过任何方式找出哪些NSF文件属于Domino服务器   java Velocity 2不会设置属性   如何使用java流迭代索引映射列表   maven依赖项在编译期间工作,但在java运行时失败。lang.NoClassDefFoundError&java。lang.ClassNotFoundException   java有没有办法完全隐藏Web应用程序代码?   Android appcompat v7 21库中的java FadingActionBar错误   Eclipse/com中的java AdMob。谷歌。安卓gms。ads.AdView   java My Service表示它正在运行,但其状态尚不清楚,似乎没有绑定   java无法启动Apache Tomcat Web应用程序容器   JDK:java中的命名约定。可丢弃的   Spring SerSecurity中的java身份验证错误   Maven:使用JDK8编译Java7   java在使用ExecutorService时如何管理内存?   comm.jar通信串行端口java   java LibGDX创建动画