python解析html页面：如何解码字符？

2024-04-18 15:33:19 发布

男 | 程序猿一只，喜欢编程写python代码。

我正在尝试解析这样的HTML页面

# coding: utf8
[...]
def search(self, a, b):
    word = self.champ_rech_canal.get_text()
    url_canal = "http://www.canalplus.fr/pid3330-c-recherche.html?rechercherSite=" + mot_canal
    try:
       f = urllib.urlopen(url_canal)
       self.feuille_canal = f.read()
       f.close()
    except: 
       self.champ_rech_canal.set_text("La recherche a échoué")
       pass
    print self.feuille_canal

结果很好，我也有“é”或“ô” 我怎样才能破译它？已尝试：

self.feuille_canal = self.feuille_canal.decode("utf-8")

结果：

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 8789: invalid continuation byte

Tags： text self url def html 页面 byte utf8

1条回答

网友

1楼 · 发布于 2024-04-18 15:33:19

您正在尝试将ISO-8859-1页解码为UTF-8，但无法工作。请参见返回的HTML中的内容标题：

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />

python解析html页面：如何解码字符？

相关问题更多 >

编程相关推荐

热门问题

热门文章

python解析html页面：如何解码字符？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >