Python - urllib2 和 cookielib
我想打开一个网站,获取初始的cookie,然后用这个cookie去打开第二个网址。但是如果你运行下面的代码,你会发现输出的cookie是两个不同的。请问我该如何使用初始的cookie去打开第二个网址呢?
import cookielib, urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
home = opener.open('https://www.idcourts.us/repository/start.do')
print cj
search = opener.open('https://www.idcourts.us/repository/partySearch.do')
print cj
输出每次显示的cookie都是两个不同的,正如你所看到的:
<cookielib.CookieJar[<Cookie JSESSIONID=0DEEE8331DE7D0DFDC22E860E065085F for www.idcourts.us/repository>]>
<cookielib.CookieJar[<Cookie JSESSIONID=E01C2BE8323632A32DA467F8A9B22A51 for www.idcourts.us/repository>]>
3 个回答
0
我觉得这是服务器的问题,因为它每次请求都会设置一个新的cookie。
7
这不是一个真正的答案(但评论太长了);可能对其他想回答这个问题的人有用。
尽管我尽力了,但我还是搞不明白这个问题。
在Firebug里查看,发现Firefox里的cookie似乎保持不变(正常工作)。
我添加了urllib2.HTTPSHandler(debuglevel=1)
来调试Python发送了什么头信息,确实看起来它重新发送了cookie。
我还把所有Firefox的请求头都加上了,看看能否有所帮助(结果没有):
opener.addheaders = [
('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13'),
..
]
我的测试代码:
import cookielib, urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), urllib2.HTTPSHandler(debuglevel=1))
opener.addheaders = [
('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13'),
('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'),
('Accept-Language', 'en-gb,en;q=0.5'),
('Accept-Encoding', 'gzip,deflate'),
('Accept-Charset', 'ISO-8859-1,utf-8;q=0.7,*;q=0.7'),
('Keep-Alive', '115'),
('Connection', 'keep-alive'),
('Cache-Control', 'max-age=0'),
('Referer', 'https://www.idcourts.us/repository/partySearch.do'),
]
home = opener.open('https://www.idcourts.us/repository/start.do')
print cj
search = opener.open('https://www.idcourts.us/repository/partySearch.do')
print cj
我觉得我可能漏掉了什么明显的东西。
21
这不是urllib的问题。那个网站做了一些奇怪的事情。你需要请求几个样式表,才能验证你的会话ID:
import cookielib, urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
# default User-Agent ('Python-urllib/2.6') will *not* work
opener.addheaders = [
('User-Agent', 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.11) Gecko/20101012 Firefox/3.6.11'),
]
stylesheets = [
'https://www.idcourts.us/repository/css/id_style.css',
'https://www.idcourts.us/repository/css/id_print.css',
]
home = opener.open('https://www.idcourts.us/repository/start.do')
print cj
sessid = cj._cookies['www.idcourts.us']['/repository']['JSESSIONID'].value
# Note the +=
opener.addheaders += [
('Referer', 'https://www.idcourts.us/repository/start.do'),
]
for st in stylesheets:
# da trick
opener.open(st+';jsessionid='+sessid)
search = opener.open('https://www.idcourts.us/repository/partySearch.do')
print cj
# perhaps need to keep updating the referer...