Python：Mac和Ubuntu之间的Unicode编码不同

Question

我在用tornado 3.2.2开发WAS服务器的时候，遇到了一些unicode的问题，因为我把系统从Mac换成了Ubuntu。

在Mac环境下，一切都正常。

但是在Ubuntu上，使用同样的数据库（远程MySQL服务器）和相同的源代码，却出现了不同的结果。

这两个系统之间唯一的不同就是操作机器（Mac和Ubuntu 14.04）和Python版本（Mac: 2.7.8，Ubuntu: 2.7.6）。

在Mac上，结果显示正常，如下所示：

"remark": "30\uc77c \uc774\uc6a9\uad8c"

但在Ubuntu上，结果却是这样：

"remark": "30? ???"

我在网上找了两天的资料，尝试了很多方法。

但是我还是找不到原因。

我尝试了各种编码和解码的方法，如下所示：

print(type(test_dict["remark"]))
print(test_dict["remark"].encode("utf-8").decode("euc-kr"))
print(test_dict["remark"].decode("utf-8").encode("euc-kr"))
print(test_dict["remark"].encode("euc-kr").decode("utf-8"))
print(test_dict["remark"].decode("euc-kr").encode("utf-8"))
print(unicode(test_dict["remark"], 'utf-8'))
encoding = chardet.detect(test_dict["remark"])
print(encoding)
print(test_dict["remark"].decode("unicode-escape"))
print(unicode(test_dict["remark"], "utf-8"))
print(unicode(test_dict["remark"], "utf-8").decode("utf-8").encode("utf-8"))
print(unicode(test_dict["remark"], "utf-8").encode("utf-8").decode("utf-8"))
for c in test_dict["remark"]:
    if c not in string.ascii_letters:
        print(" not ascii")
    else:
        print("ascii")
print(test_dict["remark"].decode(encoding["encoding"]).encode("utf-8"))
print(test_dict["remark"].encode("utf-8"))
print(test_dict["remark"].decode("utf-8").encode("euc-kr"))
print(unicode(test_dict["remark"].decode("utf-8").encode("utf-8")))

还有tornado.escape的方法。

但结果还是不对。

在Ubuntu上，结果如下：

<type 'str'>
30? ???
30? ???
30? ???
30? ???
30? ???
{'confidence': 1.0, 'encoding': 'ascii'}
30? ???
30? ???
30? ???
30? ???
 not ascii
 not ascii
 not ascii
 not ascii
 not ascii
 not ascii
 not ascii
30? ???
30? ???
30? ???
30? ???

更改区域设置为euc-kr是不允许的

我的区域设置如下：

Mac

LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"

Ubuntu

LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

在测试的时候，我发现了一些奇怪的事情……

在代码中，两个系统的表现不同

encoding = chardet.detect(test_dict["remark"])

在Mac上

{'confidence': 0.938125, 'encoding': 'utf-8'}

在Ubuntu上

{'confidence': 1.0, 'encoding': 'ascii'}

有没有人知道这是为什么呢？

任何想法或建议我都会非常感激。

提前谢谢你们！

数据库连接版本兼容性跨平台开发编码解码区域设置 unicode编码操作系统差异 tornado框架

Python：Mac和Ubuntu之间的Unicode编码不同

1 个回答

撰写回答