从har文件中获取有用信息的python框架
haralyzer的Python项目详细描述
使用har文件分析web页面的python框架。
概述
haralyzer模块包含两个类,用于分析基于 在har文件上。HarParser()表示一个完整的文件(可能有 多页),并且HarPage()表示来自所述文件的单个页。
HarParser有几个有用的方法来分析单个条目 来自har文件,但大多数相关函数都在页面内 反对。
haralyzer设计为易于使用,但您也可以访问更多 强大的功能直接。
快速介绍
哈帕瑟
HarParser接受一个表示json的dict参数 一个完整的har文件。它与har文件具有相同的属性,除了 pages是一个harpage对象:
import json from haralyzer import HarParser, HarPage with open('har_data.har', 'r') as f: har_parser = HarParser(json.loads(f.read())) print har_parser.browser # {u'name': u'Firefox', u'version': u'25.0.1'} print har_parser.hostname # 'humanssuck.net' for page in har_parser.pages: assert isinstance(page, HarPage, None) # returns True for each
竖琴
HarPage对象包含您需要轻松分析 第页。它有可访问的helper方法,但您需要的大多数数据是 在属性中方便访问。您可以直接通过提供 它是页面id(是的,我知道它很愚蠢,只是har是如何组织的),或者 一个HarParser和一个表示完整har的json的dict。 文件(见上面的示例),其中包含har-data=har-data:
import json From haralyzer import HarPage with open('har_data.har', 'r') as f: har_page = HarPage('page_3', har_data=json.loads(f.read())) ### GET BASIC INFO har_page.hostname # 'humanssuck.net' har_page.url $ 'http://humanssuck.net/about/' ### WORK WITH LOAD TIMES (all load times are in ms) ### # Get image load time in milliseconds as rendered by the browser har_page.image_load_time # prints 713 # We could do this with 'css', 'js', 'html', 'audio', or 'video' ### WORK WITH SIZES (all sizes are in bytes) ### # Get the total page size (with all assets) har_page.page_size # prints 2423765 # Get the total image size har_page.image_size # prints 733488 # We could do this with 'css', 'js', 'html', 'audio', or 'video' # Get the transferred sizes (works only with HAR files, generated with Chrome) har_page.page_size_trans har_page.image_size_trans har_page.css_size_trans har_page.text_size_trans har_page.js_size_trans har_page.audio_size_trans har_page.video_size_trans
important note-从技术上讲,在 har文件是可选的。因此,如果har文件包含未映射的条目 对于一个页面,将创建一个ID为未知的附加页面。这个 “假页”将包含所有此类条目。既然不是真正的一页 没有时间到第一个字节或页面加载之类的属性,并且将 return无。
多弹琴手
MutliHarParser接受一个list的dict,每个代表json 一个完整的har文件。这里的概念是您可以提供 相同的页面(表示多个测试运行)和MultiHarParser将提供 加载时间的聚合结果:
import json from haralyzer import HarParser, HarPage test_runs = [] with open('har_data1.har', 'r') as f1: test_runs.append( (json.loads( f1.read() ) ) with open('har_data2.har', 'r') as f2: test_runs.append( (json.loads( f2.read() ) ) multi_har_parser = MultiHarParser(har_data=test_runs) # Get the mean for the time to first byte of all runs in MS print multi_har_parser.time_to_first_byte # 70 # Get the total page load time mean for all runs in MS print multi_har_parser.load_time # 150 # Get the javascript load time mean for all runs in MS print multi_har_parser.js_load_time # 50 # You can get the standard deviation for any of these as well # Let's get the standard deviation for javascript load time print multi_har_parser.get_stdev('js') # 5 # We can also do that with 'page' or 'ttfb' (time to first byte) print multi_har_parser.get_stdev('page') # 11 print multi_har_parser.get_stdev('ttfb') # 10 ### DECIMAL PRECISION ### # You will notice that all of the results are above. That is because # the default decimal precision for the multi parser is 0. However, you # can pass whatever you want into the constructor to control this. multi_har_parser = MultiHarParser(har_data=test_runs, decimal_precision=2) print multi_har_parser.time_to_first_byte # 70.15
高级用法
HarPage包含许多有用的属性,但它们都是 使用HarParser和HarPage的公共方法轻松生成:
import json from haralyzer import HarPage with open('har_data.har', 'r') as f: har_page = HarPage('page_3', har_data=json.loads(f.read())) ### ACCESSING FILES ### # You can get a JSON representation of all assets using HarPage.entries # for entry in har_page.entries: if entry['startedDateTime'] == 'whatever I expect': ... do stuff ... # It also has methods for filtering assets # # Get a collection of entries that were images in the 2XX status code range # entries = har_page.filter_entries(content_type='image.*', status_code='2.*') # This method can filter by: # * content_type ('application/json' for example) # * status_code ('200' for example) # * request_type ('GET' for example) # * http_version ('HTTP/1.1' for example) # It will use a regex by default, but you can also force a literal string match by passing regex=False # Get the size of the collection we just made # collection_size = har_page.get_total_size(entries) # We can also access files by type with a property # for js_file in har_page.js_files: ... do stuff .... ### GETTING LOAD TIMES ### # Get the BROWSER load time for all images in the 2XX status code range # load_time = har_page.get_load_time(content_type='image.*', status_code='2.*') # Get the TOTAL load time for all images in the 2XX status code range # load_time = har_page.get_load_time(content_type='image.*', status_code='2.*', asynchronous=False)
这可能已经过时了,所以请查看sphinx文档。
更多…高级用法
所有的哈帕奇方法都是利用哈帕斯的东西, 其中一些可以用于更复杂的操作。他们也一样 对单个条目(来自harpage)或list条目的操作:
import json from haralyzer import HarParser with open('har_data.har', 'r') as f: har_parser = HarParser(json.loads(f.read())) for page in har_parser.pages: for entry in page.entries: ### MATCH HEADERS ### if har_parser.match_headers(entry, 'Content-Type', 'image.*'): print 'This would appear to be an image' ### MATCH REQUEST TYPE ### if har_parser.match_request_type(entry, 'GET'): print 'This is a GET request' ### MATCH STATUS CODE ### if har_parser.match_status_code(entry, '2.*'): print 'Looks like all is well in the world'
资产时间表
HarParser的最后一个helper函数需要它自己的节,因为它 很奇怪,但可能会有帮助,特别是在创建图表和报告时。
它可以创建一个资产时间轴,它给您返回一个dict,其中每个 key是一个datetime对象,值是 当时正在装货。list的每个值都是表示 页面上的条目。
它需要list个条目来分析,因此它假设您 已经筛选了您想知道的条目:
import json from haralyzer import HarParser with open('har_data.har', 'r') as f: har_parser = HarParser(json.loads(f.read())) ### CREATE A TIMELINE OF ALL THE ENTRIES ### entries = [] for page in har_parser.pages: for entry in page.entries: entries.append(entry) timeline = har_parser.create_asset_timeline(entries) for key, value in timeline.items(): print(type(key)) # <type 'datetime.datetime'> print(key) # 2015-02-21 19:15:41.450000-08:00 print(type(value)) # <type 'list'> print(value) # Each entry in the list is an asset from the page # [{u'serverIPAddress': u'157.166.249.67', u'cache': {}, u'startedDateTime': u'2015-02-21T19:15:40.351-08:00', u'pageref': u'page_3', u'request': {u'cookies':............................
这样,您就可以检查任意数量资产的时间线。因为密钥是datetime 对象,这是一个繁重的操作。我们可以在将来改变,但现在, 将此方法提供的资源限制为只需要检查的内容。