在gspread封装中使用unicode函数时出错。可能是个bug

0 投票

1 回答

693 浏览

提问于 2025-04-18 15:22

在使用unicode函数处理以下字符串时，出现了错误：

unicode('All but Buitoni are using Pinterest buffers and Pratt & Lamber haven’t used it for a month so I’ll check on this.')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 68: ordinal not in range(128)

当我查看第68个位置时，发现是一个撇号 '：

>>> str='All but Buitoni are using Pinterest buffers and Pratt & Lamber haven’t used it for a month so I’ll check on this.'
>>> str[62:75]
' haven\xe2\x80\x99t us'

有没有办法解决这个问题？我在gspread这个工具的models.py文件的第426行发现了这个bug。这里是那一行：

425 cell_elem = feed.find(_ns1('cell'))
426 cell_elem.set('inputValue', unicode(val))
427 uri = self._get_link('edit', feed).get('href')

所以当我尝试用一个值更新一个单元格时，这里是字符串类型，gspread工具会试图把它转换成unicode格式，但因为撇号的原因，它无法完成这个转换。可能这是一个bug。该如何解决这个问题呢？谢谢你的帮助。

error handling software development unicode gspread data processing bug string encoding

1 个回答

其实不需要替换字符。只要正确解码这个编码过的字符串为unicode就可以了：

>>> s = 'All but Buitoni are using Pinterest buffers and Pratt & Lamber haven’t used it for a month so I’ll check on this.'
>>> s.decode('utf-8')
u'All but Buitoni are using Pinterest buffers and Pratt & Lamber haven\u2019t used it for a month so I\u2019ll check on this.'  # unicode object

你需要告诉Python你的str对象使用了什么编码，这样才能把它转换成unicode，而不是直接用unicode(some_str)。在这个例子中，你的字符串是用UTF-8编码的。用这种方法处理会更好，因为你不需要为数据库中每一个unicode字符都写一个特殊的处理方式。

在我看来，处理Python中的unicode的最佳做法是这样的：

尽早将来自外部来源（比如数据库）的字符串解码为unicode。
在内部使用unicode对象。
只有在需要把它们发送到外部位置（比如文件、数据库、网络等）时，才把它们编码回字节字符串。

我还建议你看看这个幻灯片，它对如何在Python中处理unicode有很好的概述。

回答于 2025-04-18 由 Python大师

分享举报

在gspread封装中使用unicode函数时出错。可能是个bug

1 个回答

撰写回答