(我使用python2.7)
我有个测试:
# -*- coding: utf-8 -*-
import binascii
test_cases = [
'aaaaa', # Normal bytestring
'ááááá', # Normal bytestring, but with extended ascii. Since the file is utf-8 encoded, this is utf-8 encoded
'ℕℤℚℝℂ', # Encoded unicode. The editor has encoded this, and it is defined as string, so it is left encoded by python
u'aaaaa', # unicode object. The string itself is utf-8 encoded, as defined in the "coding" directive at the top of the file
u'ááááá', # unicode object. The string itself is utf-8 encoded, as defined in the "coding" directive at the top of the file
u'ℕℤℚℝℂ', # unicode object. The string itself is utf-8 encoded, as defined in the "coding" directive at the top of the file
]
FORMAT = '%-20s -> %2d %-20s %-30s %-30s'
for data in test_cases :
try:
hexlified = binascii.hexlify(data)
except:
hexlified = None
print FORMAT % (data, len(data), type(data), hexlified, repr(data))
产生输出:
^{pr2}$如您所见,对于包含非ascii字符的字符串,列没有正确对齐。这是因为这些字符串的长度(以字节为单位)大于unicode字符的数量。如何告诉print在填充字段时考虑字符数,而不是字节数?在
当Python2.7看到
'ℕℤℚℝℂ'
时,它读到“这里有15个任意字节值”。它不知道它们代表什么字符,也不知道它们代表它们的编码。您需要将此字节字符串解码为unicode字符串,并指定编码,然后才能期望python能够计算字符数:注意,在python3中,所有字符串文本都是默认的
unicode
对象相关问题 更多 >
编程相关推荐