Python:UnicodeEncodeError:“latin-1”编解码器无法对ch进行编码

Traceback (most recent call last): File "TopLevelCategories.py", line 267, in <module> cursor.execute(categoryQuery, {'title': startCategory}); File "/opt/ts/python/2.7/lib/python2.7/site-packages/MySQLdb/cursors.py", line 158, in execute query = query % db.literal(args) File "/opt/ts/python/2.7/lib/python2.7/site-packages/MySQLdb/connections.py", line 265, in literal return self.escape(o, self.encoders) File "/opt/ts/python/2.7/lib/python2.7/site-packages/MySQLdb/connections.py", line 203, in unicode_literal return db.literal(u.encode(unicode_literal.charset)) UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2013' in position 3: ordinal not in range(256)

... for startCategory in value[0]: categoryResults = [] try: categoryRow = "" baseCategoryTree[startCategory] = [] #print categoryQuery % {'title': startCategory}; cursor.execute(categoryQuery, {'title': startCategory}) #unicode issue done = False cont...

>>> import sys >>> u'\u2013'.encode('iso-8859-1') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2013' in position 0: ordinal not in range(256) >>> u'\u2013'.encode('cp1252') '\x96' >>> '\u2013'.encode('cp1252') '\\u2013' >>> u'\u2013'.encode('cp1252') '\x96'

3条回答

网友

1楼 · 编辑于 2024-05-23 13:56:42

u.encode('utf-8')将其转换为字节，然后可以使用sys.stdout.buffer.write(bytes)在stdout上打印签出displayhook on https://docs.python.org/3/library/sys.html

网友

2楼 · 编辑于 2024-05-23 13:56:42

unicode字符u'\02013'是“短划线”。它包含在Windows-1252（cp1252）字符集（编码为x96）中，但不包含在拉丁语-1（iso-8859-1）字符集中。Windows-1252字符集在x80-x9f区域中定义了更多字符，其中包括en破折号。

解决方案是选择一个不同于拉丁文-1的目标字符集，比如Windows-1252或UTF-8，或者用一个简单的“-”替换en-dash。

网友

3楼 · 编辑于 2024-05-23 13:56:42

如果您需要Latin-1编码，您可以使用多个选项来消除255以上的短划线或其他代码点（Latin-1中不包含的字符）：

>>> u = u'hello\u2013world'
>>> u.encode('latin-1', 'replace')    # replace it with a question mark
'hello?world'
>>> u.encode('latin-1', 'ignore')     # ignore it
'helloworld'

或者自己定制替换：

>>> u.replace(u'\u2013', '-').encode('latin-1')
'hello-world'

如果您不需要输出拉丁语-1，那么UTF-8是一个常见且首选的选择。W3C推荐使用它，并对所有Unicode代码点进行了良好的编码：

>>> u.encode('utf-8')
'hello\xe2\x80\x93world'

相关问题更多 >

编程相关推荐

热门问题

热门文章