无法解析HTTPConnection.debuglevel的输出
我正在尝试检查一个TCP流的输出。通过在HTTPConnection中开启调试,我可以获取到TCP流的结果,但我不知道怎么读取这些数据并用正则表达式来评估它们。我总是收到“TypeError: expected string or buffer”的错误。有没有办法把结果转换成字符串呢?谢谢!
脚本:
from urllib2 import Request, urlopen, URLError, HTTPError
import urllib2
import cookielib
import httplib
import re
httplib.HTTPConnection.debuglevel = 1
p = re.compile('abc=..........')
cj = cookielib.CookieJar()
proxy_address = '192.168.232.134:8083' # change the IP:PORT, this one is for example
proxy_handler = urllib2.ProxyHandler({'http': proxy_address})
opener = urllib2.build_opener(proxy_handler, urllib2.HTTPCookieProcessor(cj), urllib2.HTTPHandler(debuglevel=1))
urllib2.install_opener(opener)
url = "http://www.google.com/" # change the url
req=urllib2.Request(url)
data=urllib2.urlopen(req)
m=p.match(data)
if m:
print "Match found."
else:
print "Match not found."
结果:
send: 'GET hyperlink/ HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: www.google.com\r\nConnection: close\r\nUser-Agent: Python-urllib/2.6\r\n\r\n'
reply: 'HTTP/1.1 303 See Other\r\n'
header: Location: hyperlink:8083/3240951276
header: Set-Cookie: abc=3240951276; path=/; domain=.google.com; expires=Thu, 31-Dec-2020 23:59:59 GMT
header: Content-Length: 0
send: 'GET hyperlink/3240951276 HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: hyperlink\r\nConnection: close\r\nUser-Agent: Python-urllib/2.6\r\n\r\n'
reply: 'HTTP/1.1 303 See Other\r\n'
header: Location: hyperlink
header: Set-Cookie: abc=3240951276; path=/; expires=Thu, 31-Dec-2020 23:59:59 GMT
header: Content-Length: 0
send: 'GET http://www.google.com/ HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: www.google.com\r\nCookie: abc=3240951276\r\nConnection: close\r\nUser-Agent: Python-urllib/2.6\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Mon, 18 Oct 2010 21:09:32 GMT
header: Expires: -1
header: cache-control: max-age=0, private, private
header: Content-Type: text/html; charset=ISO-8859-1
header: Set-Cookie: PREF=ID=066bc785a2b15ef6:FF=0:TM=1287436172:LM=1287436172:S=mNiXaRhshpf8nLji; expires=Wed, 17-Oct-2012 21:09:32 GMT; path=/; domain=.google.com
header: Set-Cookie: NID=39=ur3gnXL80kEy4shKAh8_-XV8PhmS4G83slPcX9OD3L6uthQZw-wq7RUnB0WKGYR3F_QGoyZAyEPCvjdi69EXXq23dzvpuZSl_KU2o7pqcTB7Vym4co1LOXmi9YQGpbkb; expires=Tue, 19-Apr-2011 21:09:32 GMT; path=/; domain=.google.com; HttpOnly
header: Server: gws
header: X-XSS-Protection: 1; mode=block
header: Connection: close
header: Content-Length: 4676
header: X-Con-Reuse: 1
header: Content-Encoding: gzip
header: via: 1.1 HermesPrefetch (CID2627003316.AID3240951276.TID1)
header: X-Trace-Timing: Start=1287436172845, Sched=0, Dns=2, Con=11, RxS=28, RxD=35
Traceback (most recent call last):
File "C:\Documents and Settings\asdf\workspace\PythonScripts2\src\Test1.py", line 18, in <module>
m=p.match(data)
TypeError: expected string or buffer
1 个回答
0
你在终端看到的调试信息是由 httplib
提供的,但其实这并不是通过 urllib2.urlopen()
返回的对象的一部分。相反,这些信息是直接打印到你程序的 sys.stdout
上的。很遗憾,你无法改变 httplib
的这种行为。我不太清楚你想通过“捕获”这些输出并对其进行正则表达式处理来达到什么目的,但如果这真的是你想做的,你需要把 sys.stdout
替换成其他东西,比如一个合适的 StringIO
对象,然后找出你关心的输出是什么。
不过,请记住,httplib
在调试输出中产生的所有信息其实在你的程序中都是可以直接获取的。这些信息要么是你传给 httplib
的内容(通过 urllib2
),要么是服务器的响应,因此可以在 urllib2.urlopen()
返回的对象中找到。例如,看起来你是想提取 cookie 信息,你可以直接从你已经提供的 CookieJar
中获取这个 cookie。似乎没有什么合理的理由去捕获这些输出并进行解析。