Python 3中将字符串转换为字节的最佳方法是什么?

2024-04-19 12:51:45 发布

您现在位置:Python中文网/ 问答频道 /正文

似乎有两种不同的方法可以将字符串转换为字节,如TypeError: 'str' does not support the buffer interface的答案所示

这些方法中哪一种比较好或者更像Python?还是只是个人喜好的问题?

b = bytes(mystring, 'utf-8')

b = mystring.encode('utf-8')

Tags: the方法字符串答案support字节bytesbuffer
3条回答

绝对的最佳方式不是2,而是3。Python 3.0以来,^{}的第一个参数默认为'utf-8'。所以最好的办法是

b = mystring.encode()

这也会更快,因为默认参数不会导致C代码中的字符串"utf-8",而是会导致检查速度更快的字符串^{NULL

下面是一些时间安排:

In [1]: %timeit -r 10 'abc'.encode('utf-8')
The slowest run took 38.07 times longer than the fastest. 
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 183 ns per loop

In [2]: %timeit -r 10 'abc'.encode()
The slowest run took 27.34 times longer than the fastest. 
This could mean that an intermediate result is being cached.
10000000 loops, best of 10: 137 ns per loop

尽管有警告,但在反复运行之后,时间非常稳定——偏差仅为~2%。


不带参数使用encode()与Python 2不兼容,如python2中的默认字符编码是ASCII

>>> 'äöä'.encode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

如果您查看bytes的文档,它会指向^{}

bytearray([source[, encoding[, errors]]])

Return a new array of bytes. The bytearray type is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the bytes type has, see Bytes and Byte Array Methods.

The optional source parameter can be used to initialize the array in a few different ways:

If it is a string, you must also give the encoding (and optionally, errors) parameters; bytearray() then converts the string to bytes using str.encode().

If it is an integer, the array will have that size and will be initialized with null bytes.

If it is an object conforming to the buffer interface, a read-only buffer of the object will be used to initialize the bytes array.

If it is an iterable, it must be an iterable of integers in the range 0 <= x < 256, which are used as the initial contents of the array.

Without an argument, an array of size 0 is created.

因此bytes可以做的不仅仅是对字符串进行编码。Pythonic允许您使用任何类型的源参数调用构造函数。

对于字符串的编码,我认为some_string.encode(encoding)比使用构造函数更像是python,因为它是最自文档化的--“使用此字符串并使用此编码对其进行编码”比bytes(some_string, encoding)更清楚--使用构造函数时没有显式动词。

编辑:我检查了Python源代码。如果您使用CPython将unicode字符串传递给bytes,它将调用PyUnicode_AsEncodedString,这是encode的实现;因此,如果您自己调用encode,您只是跳过了一个间接级别。

另外,请参见Serdalis的注释——unicode_string.encode(encoding)也更像是Python,因为它的反比是byte_string.decode(encoding),而且对称性很好。

这比想象的要容易:

my_str = "hello world"
my_str_as_bytes = str.encode(my_str)
type(my_str_as_bytes) # ensure it is byte representation
my_decoded_str = my_str_as_bytes.decode()
type(my_decoded_str) # ensure it is string representation

相关问题 更多 >