2024-04-26 23:08:51 发布
网友
我需要分析一下
http://www.webpagetest.org/breakdown.php?test=150325_34_0f581da87c16d5aac4ecb7cd07cda921&run=2&cached=0
如果你查看上述网址的来源,你会发现
预期产量:
fvRequests= css fvRequests=7
import re import urllib2 if __name__ == "__main__": url = 'http://www.webpagetest.org/breakdown.php?test=150325_34_0f581da87c16d5aac4ecb7cd07cda921&run=2&cached=0' # http request response = urllib2.urlopen(url) html = response.read() response.close() # finding values in html results = re.findall(r'fvRequests\.setValue\(\d+, \d+, \'?(.*?)\'?\);', html) keys = results[::2] values = results[1::2] # creating a dictionary output = dict(zip(keys, values)) print output
其思想是使用^{}定位脚本,并使用正则表达式模式查找fvRequests.setValue()调用并提取第三个参数的值:
fvRequests.setValue()
import re from bs4 import BeautifulSoup import requests pattern = re.compile(r"fvRequests\.setValue\(\d+, \d+, '?(\w+)'?\);") response = requests.get("http://www.webpagetest.org/breakdown.php?test=150325_34_0f581da87c16d5aac4ecb7cd07cda921&run=2&cached=0") soup = BeautifulSoup(response.content) script = soup.find("script", text=lambda x: x and "fvRequests.setValue" in x).text print(re.findall(pattern, script))
印刷品:
[u'css', u'7', u'flash', u'0', u'font', u'0', u'html', u'14', u'image', u'80', u'js', u'35', u'other', u'14']
您可以进一步将列表打包到dict中(解决方案取自here):
dict(zip(*([iter(data)] * 2)))
这将产生:
{ 'image': '80', 'flash': '0', 'js': '35', 'html': '14', 'font': '0', 'other': '14', 'css': '7' }
其思想是使用^{} 定位脚本,并使用正则表达式模式查找
fvRequests.setValue()
调用并提取第三个参数的值:印刷品:
您可以进一步将列表打包到dict中(解决方案取自here):
这将产生:
相关问题 更多 >
编程相关推荐