如何在Cython中使用struct.pack和struct.unpack?

10 投票
2 回答
3243 浏览
提问于 2025-04-18 02:57

我正在尝试把一个Python模块转换成Cython,这个模块主要负责很多序列化和反序列化的工作。

现在我需要这样做:

import struct

from libc.stdint cimport (
    int32_t,
    int64_t,
)

cpdef bytes write_int(int32_t i):
    return struct.pack("!i", i)

cpdef bytes write_long(int64_t i):
    return struct.pack("!q", i)

cdef bytes write_double(double val):
    return struct.pack("!d", val)

cdef bytes write_string(bytes val):
    cdef int32_t length = len(val)
    cdef str fmt
    fmt = "!i%ds" % length
    return struct.pack(fmt, length, val)

在C语言库中有没有和struct.pack和struct.unpack相同的功能?在Cython中做这些事情的最佳方法是什么?

2 个回答

0

如果你每次只处理一种类型的数据,比如一组 整数,然后再是一组 浮点数,那么你可以使用 array.array() 这个方法来提高速度,这个方法可以通过 Python 或 Cython 来实现。

来源: 使用 Cython 序列化一组整数

10

我查看了这些模块(这个这个),然后把代码翻译成Cython,并去掉了PyObject的部分。理论上来说,这样应该可以工作,但有些部分(比如float的部分)我没有办法严格测试:

一些导入:

from cpython.array cimport array, clone
from libc.string cimport memcmp, memcpy
from libc.math cimport frexp, ldexp
from libc.stdint cimport int32_t, int64_t

保存一些带有融合类型的代码。虽然从技术上讲这不是一个稳定的特性,但对我来说运行得很好:

ctypedef fused integer:
    int32_t
    int64_t

这一部分测试了机器的字节序。对我来说是有效的,但这并不是一个完整的测试套件。另一方面,看起来是对的。

cdef enum float_format_type:
    unknown_format,
    ieee_big_endian_format,
    ieee_little_endian_format

# Set-up
cdef array stringtemplate = array('B')
cdef float_format_type double_format

cdef double x = 9006104071832581.0

if sizeof(double) == 8:
    if memcmp(&x, b"\x43\x3f\xff\x01\x02\x03\x04\x05", 8) == 0:
        double_format = ieee_big_endian_format
    elif memcmp(&x, b"\x05\x04\x03\x02\x01\xff\x3f\x43", 8) == 0:
        double_format = ieee_little_endian_format
    else:
        double_format = unknown_format

else:
    double_format = unknown_format;

stringtemplate是用来快速创建bytes对象的)

这一部分很简单:

cdef void _write_integer(integer x, char* output):
    cdef int i
    for i in range(sizeof(integer)-1, -1, -1):
        output[i] = <char>x
        x >>= 8

cpdef bytes write_int(int32_t i):
    cdef array output = clone(stringtemplate, sizeof(int32_t), False)
    _write_integer(i, output.data.as_chars)
    return output.data.as_chars[:sizeof(int32_t)]

cpdef bytes write_long(int64_t i):
    cdef array output = clone(stringtemplate, sizeof(int64_t), False)
    _write_integer(i, output.data.as_chars)
    return output.data.as_chars[:sizeof(int64_t)]

array类似于malloc,但它是自动回收内存的 :)。

这一部分我基本上不太了解。我的“测试”通过了,但更多的是靠运气:

cdef void _write_double(double x, char* output):
    cdef:
        unsigned char sign
        int e
        double f
        unsigned int fhi, flo, i
        char *s

    if double_format == unknown_format or True:
        if x < 0:
            sign = 1
            x = -x

        else:
            sign = 0

        f = frexp(x, &e)

        # Normalize f to be in the range [1.0, 2.0)

        if 0.5 <= f < 1.0:
            f *= 2.0
            e -= 1

        elif f == 0.0:
            e = 0

        else:
            raise SystemError("frexp() result out of range")

        if e >= 1024:
            raise OverflowError("float too large to pack with d format")

        elif e < -1022:
            # Gradual underflow
            f = ldexp(f, 1022 + e)
            e = 0;

        elif not (e == 0 and f == 0.0):
            e += 1023
            f -= 1.0 # Get rid of leading 1

        # fhi receives the high 28 bits; flo the low 24 bits (== 52 bits)
        f *= 2.0 ** 28
        fhi = <unsigned int>f # Truncate

        assert fhi < 268435456

        f -= <double>fhi
        f *= 2.0 ** 24
        flo = <unsigned int>(f + 0.5) # Round

        assert(flo <= 16777216);

        if flo >> 24:
            # The carry propagated out of a string of 24 1 bits.
            flo = 0
            fhi += 1
            if fhi >> 28:
                # And it also progagated out of the next 28 bits.
                fhi = 0
                e += 1
                if e >= 2047:
                    raise OverflowError("float too large to pack with d format")

        output[0] = (sign << 7) | (e >> 4)
        output[1] = <unsigned char> (((e & 0xF) << 4) | (fhi >> 24))
        output[2] = 0xFF & (fhi >> 16)
        output[3] = 0xFF & (fhi >> 8)
        output[4] = 0xFF & fhi
        output[5] = 0xFF & (flo >> 16)
        output[6] = 0xFF & (flo >> 8)
        output[7] = 0xFF & flo

    else:
        s = <char*>&x;

        if double_format == ieee_little_endian_format:
            for i in range(8):
                output[i] = s[7-i]

        else:
            for i in range(8):
                output[i] = s[i]

如果你能理解它是怎么工作的,记得自己检查一下。

然后我们像之前一样进行封装:

cdef bytes write_double(double x):
    cdef array output = clone(stringtemplate, sizeof(double), False)
    _write_double(x, output.data.as_chars)
    return output.data.as_chars[:sizeof(double)]

字符串的部分其实非常简单,这也解释了我为什么上面这样设置:

cdef bytes write_string(bytes val):
    cdef:
        int32_t int_length = sizeof(int32_t)
        int32_t input_length = len(val)
        array output = clone(stringtemplate, int_length + input_length, True)

    _write_integer(input_length, output.data.as_chars)
    memcpy(output.data.as_chars + int_length, <char*>val, input_length)

    return output.data.as_chars[:int_length + input_length]

撰写回答