在urllib2.urlopen中使用mimetools.Message

3 投票
1 回答
1148 浏览
提问于 2025-04-16 17:37

我正在使用 urllib2.urlopen()

req = urllib2.Request('http://www.google.com')
resp = urllib2.urlopen(req)
print resp.info()

print resp.info()['set-cookie']

Date: Sat, 14 May 2011 01:24:12 GMT

Expires: -1

Cache-Control: private, max-age=0

Content-Type: text/html; charset=ISO-8859-1

Set-Cookie: PREF=ID=5ec78624283cc050:FF=0:TM=1305336252:LM=1305336252:S=eRXgUUuzhQbRmZxk; expires=Mon, 13-May-2013 01:24:12 GMT; path=/; domain=.google.com

Set-Cookie: NID=46=GxyZVeWbT9dn0sLa9waPGSusm1hFqGf46SPqewahg0bzbYIQX0oHff0bzJ33E2yO89npEsYkqSoX0HLSqHbCxj5tCK2E931PfEJbqDMB6lTDk4ngVAiiyObWmbHgRUC9; expires=Sun, 13-Nov-2011 01:24:12 GMT; path=/; domain=.google.com; HttpOnly

Server: gws

X-XSS-Protection: 1; mod


PREF=ID=5ec78624283cc050:FF=0:TM=1305336252:LM=1305336252:S=eRXgUUuzhQbRmZxk; expires=Mon, 13-May-2013 01:24:12 GMT; path=/; domain=.google.com, NID=46=GxyZVeWbT9dn0sLa9waPGSusm1hFqGf46SPqewahg0bzbYIQX0oHff0bzJ33E2yO89npEsYkqSoX0HLSqHbCxj5tCK2E931PfEJbqDMB6lTDk4ngVAiiyObWmbHgRUC9; expires=Sun, 13-Nov-2011 01:24:12 GMT; path=/; domain=.google.com; HttpOnly

从响应中收到的头信息可以看到,有两个 'set-cookie' 的声明,但在 resp.info() 对象中,我发现这两个 cookie 的信息被合并在一起,用逗号(',')分开了。

这样用逗号来分隔 cookie 信息就很麻烦,因为我想分开的 cookie 信息里也包含了逗号。

有没有简单的方法可以通过这个 mimetools.message 对象(resp.info())单独调用每个 cookie 字符串呢?

要不然,我就只能手动解析这些头信息,而这个 mimetools.message/字典对象就没什么用处了。

1 个回答

3

试试使用 getheaders() 来获取一个包含所有 cookies 的 list

>>> msg = resp.info()
>>> msg.getheaders('Set-Cookie')
['PREF=ID=5975a5ee255f0949:FF=0:TM=1305336283:LM=1305336283:S=1vkES6eF4Yxd-_oM; expires=Mon, 13-May-2013 01:24:43 GMT; path=/; domain=.google.com.au', 'NID=46=lQVFZg6yKUsoWT529Hqp5gA8B_CKYd2epPIbANmw_J0UzeMt2BhuMF-gtmGsRhenUTeajKz2zILXd9xWpHWT8ZGvDcmNdkzaGX-L_-sKyY1w4e2l3DKd80JzSkt2Vp-H; expires=Sun, 13-Nov-2011 01:24:43 GMT; path=/; domain=.google.com.au; HttpOnly']

在这种情况下,你会得到一个包含两个 stringlist

然后你可以遍历这个 list,找到你想要的 cookie。 str.startswith() 是你的好帮手:

>>> cookies = msg.getheaders('Set-Cookie')
>>> for cookie in cookies:
...     if cookie.startswith('PREF='):
...             print 'Got PREF: ', cookie
...     else:
...             print 'Got another: ', cookie
... 
Got PREF:  PREF=ID=5975a5ee255f0949:FF=0:TM=1305336283:LM=1305336283:S=1vkES6eF4Yxd-_oM; expires=Mon, 13-May-2013 01:24:43 GMT; path=/; domain=.google.com.au
Got another:  NID=46=lQVFZg6yKUsoWT529Hqp5gA8B_CKYd2epPIbANmw_J0UzeMt2BhuMF-gtmGsRhenUTeajKz2zILXd9xWpHWT8ZGvDcmNdkzaGX-L_-sKyY1w4e2l3DKd80JzSkt2Vp-H; expires=Sun, 13-Nov-2011 01:24:43 GMT; path=/; domain=.google.com.au; HttpOnly

新手如何找到 Python 的文档

% python
Python 2.7.1 (r271:86832, Jan 29 2011, 13:30:16) 
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib2
>>> req = urllib2.Request('http://www.google.com')
>>> resp = urllib2.urlopen(req)
>>> help(resp.info())

撰写回答