Windows上Python 2.6和3.2的urlopen问题

Question

之前在python 2.6中，我经常使用urllib.urlopen来获取网页内容，然后再对获取的数据进行后续处理。现在，我在使用python 3.2时，遇到了一些问题，这些问题似乎只出现在Windows系统上（可能甚至只在Windows 7上）。

在Windows 7上使用以下代码，版本是python 3.2.2（64位）……

import urllib.request

fp = urllib.request.urlopen(URL_string_that_I_use)

string = fp.read()
fp.close()
print(string.decode("utf8"))

我收到了以下信息：

Traceback (most recent call last):
  File "TATest.py", line 5, in <module>
    string = fp.read()
  File "d:\python32\lib\http\client.py", line 489, in read
    return self._read_chunked(amt)
  File "d:\python32\lib\http\client.py", line 553, in _read_chunked
    self._safe_read(2)      # toss the CRLF at the end of the chunk
  File "d:\python32\lib\http\client.py", line 592, in _safe_read
    raise IncompleteRead(b''.join(s), amt)
http.client.IncompleteRead: IncompleteRead(0 bytes read, 2 more expected)

如果改用以下代码……

import urllib.request

fp = urllib.request.urlopen(URL_string_that_I_use)
for Line in fp:
    print(Line.decode("utf8").rstrip('\n'))
fp.close()

我能获取到网页的一部分内容，但后面的内容却被……阻止了。

Traceback (most recent call last):
  File "TATest.py", line 9, in <module>
    for Line in fp:
  File "d:\python32\lib\http\client.py", line 489, in read
    return self._read_chunked(amt)
  File "d:\python32\lib\http\client.py", line 545, in _read_chunked
    self._safe_read(2)  # toss the CRLF at the end of the chunk
  File "d:\python32\lib\http\client.py", line 592, in _safe_read
    raise IncompleteRead(b''.join(s), amt)
http.client.IncompleteRead: IncompleteRead(0 bytes read, 2 more expected)

尝试读取另一个页面时，得到的结果是……

Traceback (most recent call last):
  File "TATest.py", line 11, in <module>
    print(Line.decode("utf8").rstrip('\n'))
  File "d:\python32\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\x92' in position
21: character maps to <undefined>

我认为这是一个Windows的问题，但有没有办法让python更强大，以应对这个问题呢？在Linux上运行类似的代码（2.6版本的代码）时，我们没有遇到这个问题。有没有解决这个问题的方法？我也在gmane.comp.python.devel新闻组发了帖子。

错误处理 Linux urllib 网络请求 urlopen 兼容性问题数据获取 windows 7

Windows上Python 2.6和3.2的urlopen问题

1 个回答

撰写回答