如何通过selenium.py获取状态码(Python代码)
我正在用Python写一个selenium脚本,但我觉得我没有找到任何关于:
如何从selenium的Python代码中获取HTTP状态码的信息。
或者我可能漏掉了什么。如果有人找到这个信息,请随时分享。
14 个回答
11
import json
from selenium.webdriver.chrome.webdriver import WebDriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
chromedriver_path = "YOUR/PATH/TO/chromedriver.exe"
url = "https://selenium-python.readthedocs.io/api.html"
capabilities = DesiredCapabilities.CHROME.copy()
capabilities['goog:loggingPrefs'] = {'performance': 'ALL'}
browser = WebDriver(chromedriver_path, desired_capabilities=capabilities)
browser.get(url)
logs = browser.get_log('performance')
选项1:如果你只是想返回状态码,并且假设你想要的页面在包含 'text/html
内容类型的日志中存在的话……
def get_status(logs):
for log in logs:
if log['message']:
d = json.loads(log['message'])
try:
content_type = 'text/html' in d['message']['params']['response']['headers']['content-type']
response_received = d['message']['method'] == 'Network.responseReceived'
if content_type and response_received:
return d['message']['params']['response']['status']
except:
pass
用法:
>>> get_status(logs)
200
选项2:如果你想查看相关日志中的所有状态码
def get_status_codes(logs):
statuses = []
for log in logs:
if log['message']:
d = json.loads(log['message'])
if d['message'].get('method') == "Network.responseReceived":
statuses.append(d['message']['params']['response']['status'])
return statuses
用法:
>>> get_status_codes(logs)
[200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200]
注意1:很多内容是基于 @Stefan Matei 的回答,不过,Chrome版本之间有一些变化,我提供了一些解析日志的思路。
注意2:['content-type']
并不是完全可靠的。大小写可能会变化。请根据你的使用情况进行检查。
16
我对Python的经验不多,这里有一个更详细的Java示例:
https://stackoverflow.com/a/39979509/5703420
这个想法是启用性能日志。这意味着在chromedriver上触发“Network.enable”。接着获取性能日志条目,并解析出“Network.responseReceived”这个信息。
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
# enable browser logging
d = DesiredCapabilities.CHROME
d['loggingPrefs'] = { 'performance':'ALL' }
driver = webdriver.Chrome(executable_path="c:\\windows\\chromedriver.exe", service_args=["--verbose", "--log-path=D:\\temp3\\chromedriverxx.log"], desired_capabilities=d)
driver.get('https://api.ipify.org/?format=text')
print(driver.title)
print(driver.page_source)
performance_log = driver.get_log('performance')
print (str(performance_log).strip('[]'))
for entry in driver.get_log('performance'):
print (entry)
输出结果会包含你网址的“Network.responseReceived”,还有页面加载时进行的其他请求,或者重定向的链接。你只需要解析这些日志条目就可以了。
'{"message":{"method":"Network.responseReceived","params":{"frameId":"9488.1","loaderId":"9488.1","requestId":"9488.1","response":{"connectionId":14,"connectionReused":false,"encodedDataLength":-1,"fromDiskCache":false,"fromServiceWorker":false,"headers":{"Connection":"keep-alive","Content-Length":"13","Content-Type":"text/plain","Date":"Wed, 12 Oct 2016 06:15:47 GMT","Server":"Cowboy","Via":"1.1 vegur"},"headersText":"HTTP/1.1 200 OK\\r\\nServer: Cowboy\\r\\nConnection: keep-alive\\r\\nContent-Type: text/plain\\r\\nDate: Wed, 12 Oct 2016 06:15:47 GMT\\r\\nContent-Length:13\\r\\nVia:1.1vegur\\r\\n\\r\\n","mimeType":"text/plain","protocol":"http/1.1","remoteIPAddress":"54.197.246.207","remotePort":443,"requestHeaders":{"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8","Accept-Encoding":"gzip, deflate, sdch, br","Accept-Language":"en-GB,en-US;q=0.8,en;q=0.6","Connection":"keep-alive","Host":"api.ipify.org","Upgrade-Insecure-Requests":"1","User-Agent":"Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36"},"requestHeadersText":"GET /?format=text HTTP/1.1\\r\\nHost: api.ipify.org\\r\\nConnection: keep-alive\\r\\nUpgrade-Insecure-Requests: 1\\r\\nUser-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36\\r\\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\\r\\nAccept-Encoding: gzip, deflate, sdch, br\\r\\nAccept-Language: en-GB,en-US;q=0.8,en;q=0.6\\r\\n\\r\\n","securityDetails":{"certificateId":1,"certificateValidationDetails":{"numInvalidScts":0,"numUnknownScts":0,"numValidScts":0},"cipher":"AES_128_GCM","keyExchange":"ECDHE_RSA","protocol":"TLS 1.2","signedCertificateTimestampList":[]},"securityState":"secure","status":200,"statusText":"OK","timing":{"connectEnd":320.508999997401,"connectStart":3.08100000256673,"dnsEnd":3.08100000256673,"dnsStart":0,"proxyEnd":-1,"proxyStart":-1,"pushEnd":0,"pushStart":0,"receiveHeadersEnd":465.725000001839,"requestTime":78246.775045,"sendEnd":320.995999994921,"sendStart":320.825999995577,"sslEnd":320.435000001453,"sslStart":141.675999999279,"workerReady":-1,"workerStart":-1},"url":"https://api.ipify.org/?format=text"},"timestamp":78247.242716,"type":"Document"}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 1476252948094, 'level': 'INFO', 'message': '{"message":{"method":"Network.dataReceived","params":{"dataLength":13,"encodedDataLength":171,"requestId":"9488.1","timestamp":78247.243137}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 1476252948094, 'level': 'INFO', 'message': '{"message":{"method":"Page.frameNavigated","params":{"frame":{"id":"9488.1","loaderId":"9488.1","mimeType":"text/plain","securityOrigin":"https://api.ipify.org","url":"https://api.ipify.org/?format=text"}}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 1476252948095, 'level': 'INFO', 'message': '{"message":{"method":"Network.loadingFinished","params":{"encodedDataLength":171,"requestId":"9488.1","timestamp":78247.242066}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 1476252948115, 'level': 'INFO', 'message': '{"message":{"method":"Page.loadEventFired","params":{"timestamp":78247.264169}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 1476252948115, 'level': 'INFO', 'message': '{"message":{"method":"Page.frameStoppedLoading","params":{"frameId":"9488.1"}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 147625298116, 'level': 'INFO', 'message': '{"message":{"method":"Page.domContentEventFired","params":{"timestamp":78247.276475}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}, {'timestamp': 1476252948122, 'level': 'INFO', 'message': '{"message":{"method":"Network.requestWillBeSent","params":{"documentURL":"https://api.ipify.org/?format=text","frameId":"9488.1","initiator":{"type":"other"},"loaderId":"9488.1","request":{"headers":{"Referer":"https://api.ipify.org/?format=text","User-Agent":"Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36"},"initialPriority":"High","method":"GET","mixedContentType":"none","url":"https://api.ipify.org/favicon.ico"},"requestId":"9488.2","timestamp":78247.280131,"type":"Other","wallTime":1476252948.11805}},"webview":"6e8a3b1d-e5aa-40fb-a695-280cbb0ee420"}'}
然后从JSON响应中获取“status”:200。你也可以解析响应的“headers”。
53
这不可能。
很遗憾,Selenium 设计上并不提供这个信息。关于这个问题有很长的讨论,但简单来说就是:
- Selenium 是一个浏览器模拟工具,而不一定是一个测试工具。
- Selenium 在加载页面时会进行很多次 GET 和 POST 请求,如果要加一个接口来处理这些请求,会让这个工具变得复杂,而开发者并不想这样。
所以我们只能用一些变通的方法,比如:
- 在返回的 HTML 中查找错误信息。
- 使用其他工具,比如 Requests(但在 @Zeinab 的回答中可以看到这种方法的不足之处)。