Unicode编码错误：'ascii'编解码器无法编码字符u'\u2019

3 投票

3 回答

8932 浏览

提问于 2025-04-18 13:14

我正在尝试读取一个HTML文件，但在提取标题和网址以便与我的关键词'alist'进行比较时，出现了一个错误：Unicode编码错误：'ascii'编解码器无法编码字符u'\u2019'。 错误发生在链接(http://tinypic.com/r/307w8bl/8)

代码

for q in soup.find_all('a'):
    title = (q.get('title'))
    url = ((q.get('href')))
    length = len(alist)
    i = 0
    while length > 0:
        if alist[i] in str(title): #checks for keywords from html form from the titles and urls
            r.write(title)
            r.write("\n")
            r.write(url)
            r.write("\n")
        i = i + 1
        length = length -1
doc.close()
r.close()

简单介绍一下情况。alist包含了一些关键词，我会用这些关键词来和标题进行比较，以便找到我想要的内容。奇怪的是，如果alist里有两个或更多的词，它就能正常运行，但如果只有一个词，就会出现上面提到的错误。

提前谢谢大家。

错误处理 unicode 字符编码数据提取 html解析编解码链接处理关键词比较

3 个回答

可以推测，title 是一个Unicode字符串，它可以包含任何类型的字符；str(title) 试图用ASCII编码把它转换成字节串，但因为你的标题里有非ASCII字符，所以转换失败了。

你想做什么呢？为什么需要把标题转换成字节串呢？

回答于 2025-04-18 由 Python大师

分享举报

问题出在 str(title) 这一行。你在尝试把 unicode 数据转换成字符串。

你为什么要把 title 转换成字符串呢？其实你可以直接访问它。

soup.find_all 会返回一个字符串的列表。

回答于 2025-04-18 由 Python大师

分享举报

如果你的列表一定要是字符串列表，试着对标题变量进行编码。

>>> alist=['á'] #asci string
>>> title = u'á' #unicode string
>>> alist[0] in title
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
>>> title and alist[0] in title.encode('utf-8')
True
>>>

回答于 2025-04-18 由 Python大师

分享举报

Unicode编码错误：'ascii'编解码器无法编码字符u'\u2019

3 个回答

撰写回答