Python 3中将字符串转换为字节的最佳方法是什么?

2024-04-19 12:51:45 发布

您现在位置:Python中文网/ 问答频道 /正文

似乎有两种不同的方法可以将字符串转换为字节,如TypeError: 'str' does not support the buffer interface的答案所示


b = bytes(mystring, 'utf-8')

b = mystring.encode('utf-8')

Tags: the方法字符串答案support字节bytesbuffer

绝对的最佳方式不是2,而是3。Python 3.0以来,^{}的第一个参数默认为'utf-8'。所以最好的办法是

b = mystring.encode()



In [1]: %timeit -r 10 'abc'.encode('utf-8')
The slowest run took 38.07 times longer than the fastest. 
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 183 ns per loop

In [2]: %timeit -r 10 'abc'.encode()
The slowest run took 27.34 times longer than the fastest. 
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 137 ns per loop


不带参数使用encode()与Python 2不兼容,如python2中的默认字符编码是ASCII

>>> 'äöä'.encode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)


bytearray([source[, encoding[, errors]]])

Return a new array of bytes. The bytearray type is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the bytes type has, see Bytes and Byte Array Methods.

The optional source parameter can be used to initialize the array in a few different ways:

If it is a string, you must also give the encoding (and optionally, errors) parameters; bytearray() then converts the string to bytes using str.encode().

If it is an integer, the array will have that size and will be initialized with null bytes.

If it is an object conforming to the buffer interface, a read-only buffer of the object will be used to initialize the bytes array.

If it is an iterable, it must be an iterable of integers in the range 0 <= x < 256, which are used as the initial contents of the array.

Without an argument, an array of size 0 is created.


对于字符串的编码,我认为some_string.encode(encoding)比使用构造函数更像是python,因为它是最自文档化的--“使用此字符串并使用此编码对其进行编码”比bytes(some_string, encoding)更清楚--使用构造函数时没有显式动词。




my_str = "hello world"
my_str_as_bytes = str.encode(my_str)
type(my_str_as_bytes) # ensure it is byte representation
my_decoded_str = my_str_as_bytes.decode()
type(my_decoded_str) # ensure it is string representation

相关问题 更多 >