Python "string_escape" 与 "unicode_escape" 的区别

29 投票

2 回答

135037 浏览

数据工程师

提问于 2025-04-15 23:31

根据文档的说法，内置的字符串编码 string_escape：

会生成一个适合在Python源代码中作为字符串字面量的字符串。

而 unicode_escape：

会生成一个适合在Python源代码中作为Unicode字面量的字符串。

所以，它们的行为大致是相同的。但是，它们对单引号的处理似乎不同：

>>> print """before '" \0 after""".encode('string-escape')
before \'" \x00 after
>>> print """before '" \0 after""".encode('unicode-escape')
before '" \x00 after

string_escape会对单引号进行转义，而Unicode版本则不会。那么，我可以安全地假设我只需要：

>>> escaped = my_string.encode('unicode-escape').replace("'", "\\'")

...就能得到预期的行为吗？

编辑：为了更清楚，预期的行为是得到一个适合作为字面量的东西。

编程语言字符串处理 unicode 转义字符字符串字面量编码差异字符串编码文本表示

2 个回答

在0到128的范围内，没错，'是CPython 2.6中唯一的区别。

>>> set(unichr(c).encode('unicode_escape') for c in range(128)) - set(chr(c).encode('string_escape') for c in range(128))
set(["'"])

在这个范围之外，这两种类型是不能互换的。

>>> '\x80'.encode('string_escape')
'\\x80'
>>> '\x80'.encode('unicode_escape')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can’t decode byte 0x80 in position 0: ordinal not in range(128)

>>> u'1'.encode('unicode_escape')
'1'
>>> u'1'.encode('string_escape')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: escape_encode() argument 1 must be str, not unicode

在Python 3.x中，string_escape这种编码不再存在，因为str只能存储Unicode字符。

回答于 2025-04-15 由 Python大师

分享举报

根据我对CPython 2.6.5源代码中unicode-escape和unicode repr实现的理解，是的；repr(unicode_string)和unicode_string.encode('unicode-escape')之间唯一的区别就是前者会加上引号，并且会对使用的引号进行转义。

这两者都是由同一个函数unicodeescape_string来处理的。这个函数的作用就是控制是否添加引号和对引号进行转义。

回答于 2025-04-15 由 Python大师

分享举报

Python "string_escape" 与 "unicode_escape" 的区别

2 个回答

撰写回答