如何在Cython中使用struct.pack和struct.unpack？

10 投票

2 回答

3243 浏览

提问于 2025-04-18 02:57

我正在尝试把一个Python模块转换成Cython，这个模块主要负责很多序列化和反序列化的工作。

现在我需要这样做：

import struct

from libc.stdint cimport (
    int32_t,
    int64_t,
)

cpdef bytes write_int(int32_t i):
    return struct.pack("!i", i)

cpdef bytes write_long(int64_t i):
    return struct.pack("!q", i)

cdef bytes write_double(double val):
    return struct.pack("!d", val)

cdef bytes write_string(bytes val):
    cdef int32_t length = len(val)
    cdef str fmt
    fmt = "!i%ds" % length
    return struct.pack(fmt, length, val)

在C语言库中有没有和struct.pack和struct.unpack相同的功能？在Cython中做这些事情的最佳方法是什么？

c语言库序列化反序列化 cython struct

2 个回答

如果你每次只处理一种类型的数据，比如一组 整数，然后再是一组 浮点数，那么你可以使用 array.array() 这个方法来提高速度，这个方法可以通过 Python 或 Cython 来实现。

来源：使用 Cython 序列化一组整数

回答于 2025-04-18 由 Python大师

分享举报

我查看了这些模块（这个和这个），然后把代码翻译成Cython，并去掉了PyObject的部分。理论上来说，这样应该可以工作，但有些部分（比如float的部分）我没有办法严格测试：

一些导入：

from cpython.array cimport array, clone
from libc.string cimport memcmp, memcpy
from libc.math cimport frexp, ldexp
from libc.stdint cimport int32_t, int64_t

保存一些带有融合类型的代码。虽然从技术上讲这不是一个稳定的特性，但对我来说运行得很好：

ctypedef fused integer:
    int32_t
    int64_t

这一部分测试了机器的字节序。对我来说是有效的，但这并不是一个完整的测试套件。另一方面，看起来是对的。

cdef enum float_format_type:
    unknown_format,
    ieee_big_endian_format,
    ieee_little_endian_format

# Set-up
cdef array stringtemplate = array('B')
cdef float_format_type double_format

cdef double x = 9006104071832581.0

if sizeof(double) == 8:
    if memcmp(&x, b"\x43\x3f\xff\x01\x02\x03\x04\x05", 8) == 0:
        double_format = ieee_big_endian_format
    elif memcmp(&x, b"\x05\x04\x03\x02\x01\xff\x3f\x43", 8) == 0:
        double_format = ieee_little_endian_format
    else:
        double_format = unknown_format

else:
    double_format = unknown_format;

（stringtemplate是用来快速创建bytes对象的）

这一部分很简单：

cdef void _write_integer(integer x, char* output):
    cdef int i
    for i in range(sizeof(integer)-1, -1, -1):
        output[i] = <char>x
        x >>= 8

cpdef bytes write_int(int32_t i):
    cdef array output = clone(stringtemplate, sizeof(int32_t), False)
    _write_integer(i, output.data.as_chars)
    return output.data.as_chars[:sizeof(int32_t)]

cpdef bytes write_long(int64_t i):
    cdef array output = clone(stringtemplate, sizeof(int64_t), False)
    _write_integer(i, output.data.as_chars)
    return output.data.as_chars[:sizeof(int64_t)]

array类似于malloc，但它是自动回收内存的 :）。

这一部分我基本上不太了解。我的“测试”通过了，但更多的是靠运气：

cdef void _write_double(double x, char* output):
    cdef:
        unsigned char sign
        int e
        double f
        unsigned int fhi, flo, i
        char *s

    if double_format == unknown_format or True:
        if x < 0:
            sign = 1
            x = -x

        else:
            sign = 0

        f = frexp(x, &e)

        # Normalize f to be in the range [1.0, 2.0)

        if 0.5 <= f < 1.0:
            f *= 2.0
            e -= 1

        elif f == 0.0:
            e = 0

        else:
            raise SystemError("frexp() result out of range")

        if e >= 1024:
            raise OverflowError("float too large to pack with d format")

        elif e < -1022:
            # Gradual underflow
            f = ldexp(f, 1022 + e)
            e = 0;

        elif not (e == 0 and f == 0.0):
            e += 1023
            f -= 1.0 # Get rid of leading 1

        # fhi receives the high 28 bits; flo the low 24 bits (== 52 bits)
        f *= 2.0 ** 28
        fhi = <unsigned int>f # Truncate

        assert fhi < 268435456

        f -= <double>fhi
        f *= 2.0 ** 24
        flo = <unsigned int>(f + 0.5) # Round

        assert(flo <= 16777216);

        if flo >> 24:
            # The carry propagated out of a string of 24 1 bits.
            flo = 0
            fhi += 1
            if fhi >> 28:
                # And it also progagated out of the next 28 bits.
                fhi = 0
                e += 1
                if e >= 2047:
                    raise OverflowError("float too large to pack with d format")

        output[0] = (sign << 7) | (e >> 4)
        output[1] = <unsigned char> (((e & 0xF) << 4) | (fhi >> 24))
        output[2] = 0xFF & (fhi >> 16)
        output[3] = 0xFF & (fhi >> 8)
        output[4] = 0xFF & fhi
        output[5] = 0xFF & (flo >> 16)
        output[6] = 0xFF & (flo >> 8)
        output[7] = 0xFF & flo

    else:
        s = <char*>&x;

        if double_format == ieee_little_endian_format:
            for i in range(8):
                output[i] = s[7-i]

        else:
            for i in range(8):
                output[i] = s[i]

如果你能理解它是怎么工作的，记得自己检查一下。

然后我们像之前一样进行封装：

cdef bytes write_double(double x):
    cdef array output = clone(stringtemplate, sizeof(double), False)
    _write_double(x, output.data.as_chars)
    return output.data.as_chars[:sizeof(double)]

字符串的部分其实非常简单，这也解释了我为什么上面这样设置：

cdef bytes write_string(bytes val):
    cdef:
        int32_t int_length = sizeof(int32_t)
        int32_t input_length = len(val)
        array output = clone(stringtemplate, int_length + input_length, True)

    _write_integer(input_length, output.data.as_chars)
    memcpy(output.data.as_chars + int_length, <char*>val, input_length)

    return output.data.as_chars[:int_length + input_length]

回答于 2025-04-18 由 Python大师

分享举报

如何在Cython中使用struct.pack和struct.unpack？

2 个回答

撰写回答