Python curses 添加 UTF-8 编码字符串时打印两个字符

Question

我遇到了一个很奇怪的问题，想在curses窗口中打印UTF-8编码的字符串。下面是代码，我会在下面详细讲讲具体的问题和我尝试过的解决办法。

# coding=UTF-8
import curses
import locale
import time
locale.setlocale(locale.LC_ALL, '')
code = locale.getpreferredencoding()



class AddCharCommand(object):
    def __init__(self, window, line_start, y, x, character):
        """
        Command class for adding the specified character, to the specified
        window, at the specified coordinates.
        """
        self.window = window
        self.line_start = line_start
        self.x = x
        self.y = y
        self.character = character


    def write(self):
        if self.character > 127:
            # curses somehow returns a keycode that is 64 lower than what it
            # should be, this takes care of the problem.
            self.character += 64
            self.string = unichr(self.character).encode(code)
            self.window.addstr(self.y, self.x, self.string)
        else:
             self.window.addch(self.y, self.x, self.character)


    def delete(self):
        """
        Erase characters usually print two characters to the curses window.
        As such both the character at these coordinates and the one next to it
        (that is the one self.x + 1) must be replaced with the a blank space.
        Move to cursor the original coordinates when done.
        """
        for i in xrange(2):
            self.window.addch(self.y, self.x + i, ord(' '))
        self.window.move(self.y, self.x)

def main(screen):
    maxy, maxx = screen.getmaxyx()
    q = 0
    commands = list()
    x = 0
    erase = ord(curses.erasechar())
    while q != 27:
        q = screen.getch()
        if q == erase:
            command = commands.pop(-1).delete()
            x -= 1
            continue
        command = AddCharCommand(screen, 0, maxy/2, x, q)
        commands.append(command)
        command.write()
        x += 1

curses.wrapper(main)

这是一个Gist链接。

问题是，当我按下è键（它的ASCII码是232）时，屏幕上并没有只显示这个字符。相反，显示的是字符串ăè。我尝试使用self.window.addstr(self.x, self.y, self.string[1])，但结果却是显示了一堆乱码。

然后我打开了Python提示符，查看unichr(232).encode('utf-8')的返回值，确实是一个长度为2的字符串。

最让我意外的是，如果我在main中写入screen.addstr(4, 4, unichr(232).encode(code))，它会正确显示è这个字符，而且只显示这个字符。如果我让AddCharCommand类的write方法打印è字符，不管怎样，它也能正常工作。

当然，这个问题不仅仅限于è，几乎所有的扩展ASCII字符都是这样。

我知道在curses中处理扩展ASCII有点不稳定，但我完全无法理解这种行为。对我来说，代码如果硬编码ASCII码就能正常工作，但如果不这样做却多了一个字符，这一点完全说不通。

我查阅了很多关于curses的资料，但一直没有找到解决办法。如果有人能帮我解决这个问题，我会非常感激，这让我快要抓狂了。

也许这不是特别重要，但我希望有人能解释一下为什么screen.getch()对于127以上的字符返回错误的ASCII码，以及curses返回的ASCII码和真实ASCII码之间的差异为什么是64。

非常感谢大家的帮助。

字符串处理字符编码 utf-8 ascii 编码问题 curses 终端编程扩展ascii

Python curses 添加 UTF-8 编码字符串时打印两个字符

1 个回答

撰写回答