在Python中获取带双引号的字符串表示

25 投票

4 回答

15959 浏览

提问于 2025-04-15 15:39

我正在用一个小的Python脚本生成一些二进制数据，这些数据会用在C语言的头文件里。

这些数据应该声明为一个 char[]，如果能把它编码成字符串（在遇到不属于ASCII可打印字符的情况时，使用相应的转义序列），那就更好了，这样可以让头文件比用十进制或十六进制数组编码时更紧凑。

问题是，当我打印一个Python字符串的 repr 时，它是用单引号包围的，而C语言不喜欢这样。一个简单的解决办法是：

'"%s"'%repr(data)[1:-1]

但如果数据中的某个字节恰好是双引号，那就不行了，所以我还需要把双引号也转义。

我觉得简单的 replace('"', '\\"') 可能可以解决这个问题，但也许还有更好的、更符合Python风格的解决方案。

额外提示：

把数据分成大约80个字符一行也是很方便的，但简单地把源字符串按80个字符分块的方法又不行，因为每个不可打印字符在转义序列中占用2到3个字符。把列表在得到 repr 之后按80个字符分块也没用，因为这样可能会把转义序列分开。

有什么建议吗？

字符串处理 c语言数据格式化二进制数据头文件转义序列字符串编码可打印字符

4 个回答

最好不要去修改 repr() 这个函数，而是从一开始就使用正确的编码方式。你可以直接通过编码 string_escape 来获取 repr 的编码。

>>> "naïveté".encode("string_escape")
'na\\xc3\\xafvet\\xc3\\xa9'
>>> print _
na\xc3\xafvet\xc3\xa9

对于转义双引号，我觉得在对字符串进行转义编码后，简单地用替换的方法来处理是一个非常明确的过程：

>>> '"%s"' % 'data:\x00\x01 "like this"'.encode("string_escape").replace('"', r'\"')
'"data:\\x00\\x01 \\"like this\\""'
>>> print _
"data:\x00\x01 \"like this\""

回答于 2025-04-15 由 Python大师

分享举报

你可以试试 json.dumps 这个方法：

>>> import json
>>> print(json.dumps("hello world"))
"hello world"

>>> print(json.dumps('hëllo "world"!'))
"h\u00ebllo \"world\"!"

我不太确定 json 字符串是否和 C 语言兼容，但至少它们有很多共同的部分，而且保证和 JavaScript 是兼容的；）

回答于 2025-04-15 由 Python大师

分享举报

repr() 不是你想要的东西。这里有个根本的问题：repr() 可以用任何可以被 Python 解释的字符串表示方式来生成字符串。这意味着，理论上它可能会选择一些在 C 语言中不合法的表示方式，比如 """长字符串"""。

这段代码可能是个正确的方向。我用的是 140 字符的默认换行，这在 2009 年是个合理的值，但如果你真的想把代码换行到 80 列，只需改一下这个值就行。

如果设置 unicode=True，它会输出一个 L"宽" 字符串，这样可以有意义地存储 Unicode 转义字符。或者，根据你使用的程序，你可能想把 Unicode 字符转换成 UTF-8 并输出转义后的字符。

def string_to_c(s, max_length = 140, unicode=False):
    ret = []

    # Try to split on whitespace, not in the middle of a word.
    split_at_space_pos = max_length - 10
    if split_at_space_pos < 10:
        split_at_space_pos = None

    position = 0
    if unicode:
        position += 1
        ret.append('L')

    ret.append('"')
    position += 1
    for c in s:
        newline = False
        if c == "\n":
            to_add = "\\\n"
            newline = True
        elif ord(c) < 32 or 0x80 <= ord(c) <= 0xff:
            to_add = "\\x%02x" % ord(c)
        elif ord(c) > 0xff:
            if not unicode:
                raise ValueError, "string contains unicode character but unicode=False"
            to_add = "\\u%04x" % ord(c)
        elif "\\\"".find(c) != -1:
            to_add = "\\%c" % c
        else:
            to_add = c

        ret.append(to_add)
        position += len(to_add)
        if newline:
            position = 0

        if split_at_space_pos is not None and position >= split_at_space_pos and " \t".find(c) != -1:
            ret.append("\\\n")
            position = 0
        elif position >= max_length:
            ret.append("\\\n")
            position = 0

    ret.append('"')

    return "".join(ret)

print string_to_c("testing testing testing testing testing testing testing testing testing testing testing testing testing testing testing testing testing", max_length = 20)
print string_to_c("Escapes: \"quote\" \\backslash\\ \x00 \x1f testing \x80 \xff")
print string_to_c(u"Unicode: \u1234", unicode=True)
print string_to_c("""New
lines""")

回答于 2025-04-15 由 Python大师

分享举报

在Python中获取带双引号的字符串表示

4 个回答

撰写回答