在Python 3中将字符串转换为字节的最佳方法是什么？

1478 投票

5 回答

2710667 浏览

提问于 2025-04-17 03:16

TypeError: 'str' 不支持缓冲区接口提出了两种将字符串转换为字节的方法：

b = bytes(mystring, 'utf-8')

b = mystring.encode('utf-8')

哪种方法更符合Python的风格呢？

_{查看将字节转换为字符串了解反向操作。}

字符串处理编程风格类型错误数据类型字符串转换反向操作缓冲区接口字节编码

5 个回答

282

最好的方法其实不是前面提到的两种，而是第三种。从Python 3.0开始，encode的第一个参数默认是'utf-8'。所以最好的写法是：

b = mystring.encode()

这样做会更快，因为默认参数在C代码中并不是字符串"utf-8"，而是NULL，检查NULL要快得多！

这里有一些时间测试的数据：

In [1]: %timeit -r 10 'abc'.encode('utf-8')
The slowest run took 38.07 times longer than the fastest. 
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 183 ns per loop

In [2]: %timeit -r 10 'abc'.encode()
The slowest run took 27.34 times longer than the fastest. 
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 137 ns per loop

尽管有警告，但经过多次运行后，时间的稳定性很好，偏差只有大约2%。

在没有参数的情况下使用encode()并不兼容Python 2，因为在Python 2中，默认的字符编码是ASCII。

>>> 'äöä'.encode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

回答于 2025-04-17 由 Python大师

分享举报

704

其实这比大家想的要简单：

my_str = "hello world"
my_str_as_bytes = str.encode(my_str)
print(type(my_str_as_bytes)) # ensure it is byte representation
my_decoded_str = my_str_as_bytes.decode()
print(type(my_decoded_str)) # ensure it is string representation

你可以通过打印出类型来验证。请看下面的输出。

<class 'bytes'>
<class 'str'>

回答于 2025-04-17 由 Python大师

分享举报

863

如果你查看 bytes 的文档，它会引导你去看 bytearray：

bytearray([source[, encoding[, errors]]])

这个函数会返回一个新的字节数组。bytearray 类型是一个可变的整数序列，范围是 0 到 255。它有很多常见的可变序列的方法，具体可以参考可变序列类型的相关内容，还有大部分 bytes 类型的方法，详细信息可以查看字节和字节数组方法。

可选的 source 参数可以用来以几种不同的方式初始化这个数组：

如果 source 是一个字符串，你还必须提供编码（encoding）和可选的错误处理（errors）参数；这时 bytearray() 会使用 str.encode() 将字符串转换为字节。

如果 source 是一个整数，数组的大小将是这个整数，并且会用空字节初始化。

如果 source 是一个符合缓冲区接口的对象，字节数组将使用这个对象的只读缓冲区进行初始化。

如果 source 是一个可迭代对象，它必须是一个整数的可迭代对象，范围是 0 到 255，这些整数将作为数组的初始内容。

如果没有参数，将创建一个大小为 0 的数组。

所以 bytes 的功能远不止于编码字符串。它很符合 Python 的风格，允许你用任何合适的类型作为 source 参数来调用构造函数。

对于字符串的编码，我觉得 some_string.encode(encoding) 比用构造函数更符合 Python 的风格，因为它更自解释——“把这个字符串用这个编码进行编码”比 bytes(some_string, encoding) 更清晰，后者没有明确的动词。

我查看了 Python 的源代码。如果你使用 CPython 将一个 Unicode 字符串传给 bytes，它会调用 PyUnicode_AsEncodedString，这就是 encode 的实现；所以如果你自己调用 encode，只是省略了一层间接调用。

另外，看看 Serdalis 的评论——unicode_string.encode(encoding) 也更符合 Python 的风格，因为它的反操作是 byte_string.decode(encoding)，这种对称性很不错。

回答于 2025-04-17 由 Python大师

分享举报

在Python 3中将字符串转换为字节的最佳方法是什么？

5 个回答

撰写回答