Python的`urllib2`：为什么我在`urlopen`一个维基百科页面时得到403错误？

58 投票

6 回答

77135 浏览

提问于 2025-04-16 01:51

我在尝试用 urlopen 打开维基百科的某个页面时遇到了一个奇怪的错误。这个页面是：

http://en.wikipedia.org/wiki/OpenCola_(drink)

这是我在命令行中的操作记录：

>>> f = urllib2.urlopen('http://en.wikipedia.org/wiki/OpenCola_(drink)')
Traceback (most recent call last):
  File "C:\Program Files\Wing IDE 4.0\src\debug\tserver\_sandbox.py", line 1, in <module>
    # Used internally for debug sandbox under external interpreter
  File "c:\Python26\Lib\urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "c:\Python26\Lib\urllib2.py", line 397, in open
    response = meth(req, response)
  File "c:\Python26\Lib\urllib2.py", line 510, in http_response
    'http', request, response, code, msg, hdrs)
  File "c:\Python26\Lib\urllib2.py", line 435, in error
    return self._call_chain(*args)
  File "c:\Python26\Lib\urllib2.py", line 369, in _call_chain
    result = func(*args)
  File "c:\Python26\Lib\urllib2.py", line 518, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden

这个问题在我不同大陆的两台电脑上都出现过。有没有人知道为什么会这样？

urllib2 网络请求维基百科 urlopen 403错误 http状态码跨域访问

6 个回答

很多时候，网站会通过检查访问者的身份来限制访问，看看是不是一个被认可的用户代理。维基百科把你的脚本当成了机器人，所以拒绝了它。你可以尝试伪装成一个浏览器。下面这个链接有一篇文章，可以教你怎么做。

http://wolfprojects.altervista.org/changeua.php

回答于 2025-04-16 由 Python大师

分享举报

要解决这个问题，你需要捕捉到那个异常。

try:
    f = urllib2.urlopen('http://en.wikipedia.org/wiki/OpenCola_(drink)')
except urllib2.HTTPError, e:
    print e.fp.read()

当我打印出结果信息时，它包含了以下内容：

"英文

我们的服务器目前遇到了技术问题。这可能是暂时的，应该很快就会修复。请过几分钟再试一次。"

回答于 2025-04-16 由 Python大师

分享举报

139

维基百科的立场是：

数据获取：机器人（也就是自动程序）不能用来获取大量内容，除非这个内容是为了经过批准的机器人任务。这包括从其他网站动态加载页面，这样可能会导致该网站被列入黑名单，永久禁止访问。如果你想下载大量内容或者镜像一个项目，请通过下载或托管你自己的数据库副本来实现。

这就是为什么Python被封锁的原因。你应该去下载数据备份。

无论如何，你可以用Python 2这样读取页面：

req = urllib2.Request(url, headers={'User-Agent' : "Magic Browser"}) 
con = urllib2.urlopen( req )
print con.read()

或者用Python 3这样：

import urllib
req = urllib.request.Request(url, headers={'User-Agent' : "Magic Browser"}) 
con = urllib.request.urlopen( req )
print(con.read())

回答于 2025-04-16 由 Python大师

分享举报

Python的`urllib2`：为什么我在`urlopen`一个维基百科页面时得到403错误？

6 个回答

撰写回答