如何解决使用Python解码和打印希腊字符的困难？

# This is the code for the prompt I noted at the beginning. # The variable gr_en_dict is the dictionary noted right above. for key in gr_en_dict: user_reply = raw_input('%s: ' % (gr_en_dict[key])).decode(sys.stdout.encoding)

C:\>chcp Active code page: 437 C:\>\python25\python Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.stdout.encoding 'cp437' >>> print '? α?ε??δα' ? α?ε??δα >>>

C:\>chcp 869 Active code page: 869 C:\>\python25\python Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.stdout.encoding 'cp869' >>> print ' η αγελάδα' η αγελάδα >>> print 'η αγελάδα' η αγελάδα >>>

2条回答

网友

1楼 · 编辑于 2024-06-16 11:59:19

对于部分问题，请使用：

words_text = codecs.open(filename, 'r', 'utf-8-sig')

它将处理\ufeff的字节顺序标记。在

从技术上讲，这：

^{pr2}$

应该是：

user_reply = raw_input('%s: ' % (gr_en_dict[key])).decode(sys.stdin.encoding)

但实际上它们应该是相同的编码。在

我认为问题是默认控制台中的编码不支持所有希腊字符。当我切换到希腊代码页时，事情开始好转。请注意，我可以将正确的字符粘贴到下面的print语句中，但cp437实际上并不支持所有字符，因此在打印时，不支持的字符将替换为问号：

C:\>chcp
Active code page: 437

C:\>python
Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdout.encoding
'cp437'
>>> print 'η αγελάδα - cow'
? α?ε??δα - cow

如果我切换到希腊代码页（869或1253），它可以工作：

C:\>chcp 869
Active code page: 869

C:\>python
Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdout.encoding
'cp869'
>>> print 'η αγελάδα - cow'
η αγελάδα - cow
>>>

网友

2楼 · 编辑于 2024-06-16 11:59:19

标准windows shell存在扩展字符问题。我建议使用类似Windows PowerShell的东西。在

对于'\ufeff'字符（字节顺序标记），您可以在读入文件后执行以下检查：

words_text = codecs.open(filename, 'r', 'utf-8')
words_text_lines = words_text.readlines()

if words_text_lines and words_text_lines[0][0]==unicode(codecs.BOM_UTF8, 'utf8'):
    words_text_lines[0] = words_text_lines[0][1:]

那样的话，如果它在那里，你就把它扔掉了。在

相关问题更多 >

编程相关推荐

热门问题

热门文章