通过Python解析URL

2024-04-26 23:08:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要分析一下

http://www.webpagetest.org/breakdown.php?test=150325_34_0f581da87c16d5aac4ecb7cd07cda921&run=2&cached=0

如果你查看上述网址的来源,你会发现

预期产量

fvRequests= css
fvRequests=7

Tags: runorgtesthttpwww来源cssphp
2条回答
import re
import urllib2



if __name__ == "__main__":
    url = 'http://www.webpagetest.org/breakdown.php?test=150325_34_0f581da87c16d5aac4ecb7cd07cda921&run=2&cached=0'

    # http request
    response = urllib2.urlopen(url)
    html = response.read()
    response.close()

    # finding values in html
    results = re.findall(r'fvRequests\.setValue\(\d+, \d+, \'?(.*?)\'?\);', html)
    keys = results[::2]
    values = results[1::2]

    # creating a dictionary
    output = dict(zip(keys, values))

    print output

其思想是使用^{}定位脚本,并使用正则表达式模式查找fvRequests.setValue()调用并提取第三个参数的值:

import re

from bs4 import BeautifulSoup
import requests


pattern = re.compile(r"fvRequests\.setValue\(\d+, \d+, '?(\w+)'?\);")

response = requests.get("http://www.webpagetest.org/breakdown.php?test=150325_34_0f581da87c16d5aac4ecb7cd07cda921&run=2&cached=0")
soup = BeautifulSoup(response.content)

script = soup.find("script", text=lambda x: x and "fvRequests.setValue" in x).text
print(re.findall(pattern, script))

印刷品:

[u'css', u'7', u'flash', u'0', u'font', u'0', u'html', u'14', u'image', u'80', u'js', u'35', u'other', u'14']

您可以进一步将列表打包到dict中(解决方案取自here):

dict(zip(*([iter(data)] * 2)))

这将产生:

{
    'image': '80', 
    'flash': '0', 
    'js': '35', 
    'html': '14',  
    'font': '0', 
    'other': '14', 
    'css': '7'
}

相关问题 更多 >