Python中的Unicode类

2 投票

3 回答

4607 浏览

提问于 2025-04-15 18:49

help(unicode) 会打印出类似下面的内容：

class unicode(basestring)
 |  unicode(string [, encoding[, errors]]) -> object
...

但是你可以用其他类型的东西作为参数，而不是只用字符串，比如你可以写 unicode(1)，这样就会得到 u'1'。那这个调用发生了什么呢？其实，整数(int)并没有一个叫做 __unicode__ 的方法可以被调用。

整数转换字符串处理 unicode 数据类型编码

3 个回答

如果没有定义 __unicode__ 方法，那么就会调用 __str__ 方法。无论调用哪个方法，如果返回的是 unicode 类型的内容，它会直接使用。如果返回的是 str 类型的内容，它会根据默认的编码方式进行解码，这个默认编码通常是通过 sys.getdefaultencoding() 获取的，几乎总是 'ascii'。如果返回的是其他类型的对象，就会出现 TypeError 错误。

（通过重新加载 sys 模块，可以使用 sys.setdefaultencoding() 来改变默认编码；但这样做基本上总是个坏主意。）

回答于 2025-04-15 由 Python大师

分享举报

这和 unicode(str(1)) 是一样的。

>>> class thing(object):
...     def __str__(self):
...         print "__str__ called on " + repr(self)
...         return repr(self)
...
>>> a = thing()
>>> a
<__main__.thing object at 0x7f2f972795d0>
>>> unicode(a)
__str__ called on <__main__.thing object at 0x7f2f972795d0>
u'<__main__.thing object at 0x7f2f972795d0>'

如果你想深入了解底层的细节，可以打开 Python 的解释器源代码。

Objects/unicodeobject.c#PyUnicode_Type 定义了 unicode 类型，并且它的构造函数是 .tp_new=unicode_new。

因为没有提供可选的参数 encoding 或 errors，而且正在构建一个 unicode 对象（不是 unicode 的子类），所以 Objects/unicodeobject.c#unicode_new 调用了 PyObject_Unicode。

Objects/object.c#PyObject_Unicode 会调用 __unicode__ 方法（如果这个方法存在的话）。如果没有这个方法，它会退而求其次，调用 PY_Type(v)->tp_str（也就是 __str__）或者 PY_Type(v)->tp_repr（也就是 __repr__）。然后，它会把结果传给 PyUnicode_FromEncodedObject。

Objects/unicodeobject.c#PyUnicode_FromEncodedObject 发现它得到了一个字符串，并把它传给 PyUnicode_Decode，这个函数会返回一个 unicode 对象。

最后，PyObject_Unicode 返回到 unicode_new，然后返回这个 unicode 对象。

简单来说，unicode() 会自动把你的对象转换成字符串（如果需要的话）。这就是 Python 按照预期工作的方式。

回答于 2025-04-15 由 Python大师

分享举报

如果存在 __unicode__ 这个方法，就会调用它；如果没有，就会使用 __str__ 这个方法。

class A(int):
    def __str__(self):
        print "A.str"
        return int.__str__(self)

    def __unicode__(self):
        print "A.unicode"
        return int.__str__(self)

class B(int):
    def __str__(self):
        print "B.str"
        return int.__str__(self)


unicode(A(1)) # prints "A.unicode"
unicode(B(1)) # prints "B.str"

回答于 2025-04-15 由 Python大师

分享举报

Python中的Unicode类

3 个回答

撰写回答