Python 3 - HTTP 代理问题

0 投票

1 回答

2520 浏览

提问于 2025-04-17 17:45

我在Windows 7上使用的是Python 3.3.0。

我写了一个脚本，目的是在一个系统上绕过http代理而不需要认证。但是当我运行这个脚本时，出现了一个错误：UnicodeEncodeError: 'charmap' codec can't encode characters in position 6242-6243: character maps to <undefined>。看起来是无法把unicode字符转换成字符串。

那么，我应该使用什么，或者需要做什么修改呢？有没有人有线索或者解决办法？

我的.py文件包含以下内容：

import sys, urllib
import urllib.request

url = "http://www.python.org"
proxies = {'http': 'http://199.91.174.6:3128/'}

opener = urllib.request.FancyURLopener(proxies)

try:
    f = urllib.request.urlopen(url)
except urllib.error.HTTPError as  e:
    print ("[!] The connection could not be established.")
    print ("[!] Error code: ",  e.code)
    sys.exit(1)
except urllib.error.URLError as  e:
    print ("[!] The connection could not be established.")
    print ("[!] Reason: ",  e.reason)
    sys.exit(1)

source = f.read()

if "iso-8859-1" in str(source):
    source = source.decode('iso-8859-1')
else:
    source = source.decode('utf-8')

print("\n SOURCE:\n",source)

错误处理脚本编写 http代理 windows 7 unicode编码

1 个回答

这段代码根本没有使用你的代理。
这种检测编码的方式真的很弱。你应该只在一些明确的位置查找声明的编码，比如HTTP头部的'Content-Type'，以及如果响应是HTML的话，在charset的meta标签里。
因为你没有提供错误堆栈信息，我猜错误发生在这一行 if "iso-8859-1" in str(source):。调用str()会用你系统的默认编码来解码字节数据（sys.getdefaultencoding()）。如果你真的想保留这个检查（见第二点），你应该改成 if b"iso-8859-1" in source:。这样是直接在字节上操作，不需要提前解码。

注意：这段代码在我这儿运行得很好，可能是因为我的系统默认编码是utf-8，而你的Windows系统用的是其他编码。

更新：我建议在Python中进行HTTP请求时使用python-requests库。

import requests

proxies = {'http': your_proxy_here}

with requests.Session(proxies=proxies) as sess:
    r = sess.get('http://httpbin.org/ip')
    print(r.apparent_encoding)
    print(r.text)
    # more requests

注意：这段代码没有使用HTML中指定的编码，你需要像beautifulsoup这样的HTML解析器来提取那个编码。

回答于 2025-04-17 由 Python大师

分享举报

Python 3 - HTTP 代理问题

1 个回答

撰写回答