为什么我不能用python下载这个网页？

def download(source_url): try: socket.setdefaulttimeout(10) agents = ['Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)','Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.1)','Microsoft Internet Explorer/4.0b1 (Windows 95)','Opera/8.00 (Windows NT 5.1; U; en)'] ree = urllib2.Request(source_url) ree.add_header('User-Agent',random.choice(agents)) resp = urllib2.urlopen(ree) htmlSource = resp.read() return htmlSource except Exception, e: print e return "" download('http://www.windowsphone.com/en-US/apps?list=free')

2条回答

网友

1楼 · 编辑于 2024-04-27 13:54:48

弗莱斯克在这个问题上确实有答案（+1）。在

另一种直接调试HTTP连接的方法是Netcat，它基本上是一个强大的telnet实用程序。在

假设您想调试HTTP请求中的内容：

$ nc www.windowsphone.com 80
GET /en-US/apps?list=free HTTP/1.0
Host: www.windowsphone.com
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)

这将把请求头发送到服务器（需要按两次enter键发送）。在

之后，服务器将响应：

^{pr2}$
因此服务器返回302，这是重定向的HTTP状态代码，因此提示“浏览器”打开在位置标头中传递的URL。在
Netcat是一个调试和跟踪各种网络通信的好工具，当我想深入了解HTTP协议时，它给了我很大的帮助。在

网友
2楼 · 编辑于 2024-04-27 13:54:48

失败的原因是http://www.windowsphone.com试图设置一个cookie，它在创建另一个cookie并重定向回的https://login.live.com上被选中windowsphone.com如果成功。在
你应该看看http://docs.python.org/library/cookielib.html
如果要使用curl，请允许它创建一个cookie文件，如下所示：
curl -so /dev/null 'http://www.windowsphone.com/en-US/apps?list=free' -c 'myCookieJar'
在shell中运行more myCookieJar，您将看到如下内容：
^{pr2}$
运行（注意'mycokiejar'之前的-b选项）：
curl -so 'windowsphone.html' 'http://www.windowsphone.com/en-US/apps?list=free' -b 'myCookieJar'
你会在文件里找到页面的内容windowsphone.html正如你在浏览器中看到的那样。在

相关问题更多 >

编程相关推荐

热门问题

热门文章