在urllib2.urlopen中使用mimetools.Message
我正在使用 urllib2.urlopen()
:
req = urllib2.Request('http://www.google.com')
resp = urllib2.urlopen(req)
print resp.info()
print resp.info()['set-cookie']
Date: Sat, 14 May 2011 01:24:12 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Set-Cookie: PREF=ID=5ec78624283cc050:FF=0:TM=1305336252:LM=1305336252:S=eRXgUUuzhQbRmZxk; expires=Mon, 13-May-2013 01:24:12 GMT; path=/; domain=.google.com
Set-Cookie: NID=46=GxyZVeWbT9dn0sLa9waPGSusm1hFqGf46SPqewahg0bzbYIQX0oHff0bzJ33E2yO89npEsYkqSoX0HLSqHbCxj5tCK2E931PfEJbqDMB6lTDk4ngVAiiyObWmbHgRUC9; expires=Sun, 13-Nov-2011 01:24:12 GMT; path=/; domain=.google.com; HttpOnly
Server: gws
X-XSS-Protection: 1; mod
PREF=ID=5ec78624283cc050:FF=0:TM=1305336252:LM=1305336252:S=eRXgUUuzhQbRmZxk; expires=Mon, 13-May-2013 01:24:12 GMT; path=/; domain=.google.com, NID=46=GxyZVeWbT9dn0sLa9waPGSusm1hFqGf46SPqewahg0bzbYIQX0oHff0bzJ33E2yO89npEsYkqSoX0HLSqHbCxj5tCK2E931PfEJbqDMB6lTDk4ngVAiiyObWmbHgRUC9; expires=Sun, 13-Nov-2011 01:24:12 GMT; path=/; domain=.google.com; HttpOnly
从响应中收到的头信息可以看到,有两个 'set-cookie' 的声明,但在 resp.info()
对象中,我发现这两个 cookie 的信息被合并在一起,用逗号(',')分开了。
这样用逗号来分隔 cookie 信息就很麻烦,因为我想分开的 cookie 信息里也包含了逗号。
有没有简单的方法可以通过这个 mimetools.message 对象(resp.info()
)单独调用每个 cookie 字符串呢?
要不然,我就只能手动解析这些头信息,而这个 mimetools.message/字典对象就没什么用处了。
1 个回答
3
试试使用 getheaders()
来获取一个包含所有 cookies 的 list
:
>>> msg = resp.info()
>>> msg.getheaders('Set-Cookie')
['PREF=ID=5975a5ee255f0949:FF=0:TM=1305336283:LM=1305336283:S=1vkES6eF4Yxd-_oM; expires=Mon, 13-May-2013 01:24:43 GMT; path=/; domain=.google.com.au', 'NID=46=lQVFZg6yKUsoWT529Hqp5gA8B_CKYd2epPIbANmw_J0UzeMt2BhuMF-gtmGsRhenUTeajKz2zILXd9xWpHWT8ZGvDcmNdkzaGX-L_-sKyY1w4e2l3DKd80JzSkt2Vp-H; expires=Sun, 13-Nov-2011 01:24:43 GMT; path=/; domain=.google.com.au; HttpOnly']
在这种情况下,你会得到一个包含两个 string
的 list
。
然后你可以遍历这个 list
,找到你想要的 cookie。 str.startswith()
是你的好帮手:
>>> cookies = msg.getheaders('Set-Cookie')
>>> for cookie in cookies:
... if cookie.startswith('PREF='):
... print 'Got PREF: ', cookie
... else:
... print 'Got another: ', cookie
...
Got PREF: PREF=ID=5975a5ee255f0949:FF=0:TM=1305336283:LM=1305336283:S=1vkES6eF4Yxd-_oM; expires=Mon, 13-May-2013 01:24:43 GMT; path=/; domain=.google.com.au
Got another: NID=46=lQVFZg6yKUsoWT529Hqp5gA8B_CKYd2epPIbANmw_J0UzeMt2BhuMF-gtmGsRhenUTeajKz2zILXd9xWpHWT8ZGvDcmNdkzaGX-L_-sKyY1w4e2l3DKd80JzSkt2Vp-H; expires=Sun, 13-Nov-2011 01:24:43 GMT; path=/; domain=.google.com.au; HttpOnly
新手如何找到 Python 的文档
% python
Python 2.7.1 (r271:86832, Jan 29 2011, 13:30:16)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib2
>>> req = urllib2.Request('http://www.google.com')
>>> resp = urllib2.urlopen(req)
>>> help(resp.info())