获取网址时发生UnicodeEncodeError

3 投票

1 回答

2754 浏览

提问于 2025-04-16 04:02

我正在使用urlfetch来获取一个网址。当我尝试把它发送到html2text函数（这个函数会去掉所有的HTML标签）时，我收到了以下信息：

UnicodeEncodeError: 'charmap' codec can't encode characters in position  ... character maps to <undefined>

我一直在尝试对这个字符串进行encode('UTF-8','ignore')处理，但总是出现这个错误。

有没有什么想法？

谢谢，

乔尔

一些代码：

result = urlfetch.fetch(url="http://www.google.com")
html2text(result.content.encode('utf-8', 'ignore'))

还有错误信息：

File "C:\Python26\lib\encodings\cp1252.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 159-165: character maps to <undefined>

1 个回答

你需要先解码你获取的数据！用什么解码器呢？这要看你获取数据的网站。

当你有了unicode字符，试着用 some_unicode.encode('utf-8', 'ignore') 来编码时，我想不出会出现什么错误。

好吧，你需要做的是：

result = fetch('http://google.com') 
content_type = result.headers['Content-Type'] # figure out what you just fetched
ctype, charset = content_type.split(';')
encoding = charset[len(' charset='):] # get the encoding
print encoding # ie ISO-8859-1
utext = result.content.decode(encoding) # now you have unicode
text = utext.encode('utf8', 'ignore') # encode to uft8

这并不是特别稳妥，但应该能给你指明方向。

回答于 2025-04-16 由 Python大师

分享举报

获取网址时发生UnicodeEncodeError

1 个回答

撰写回答