如何在Cython中使用struct.pack和struct.unpack?
我正在尝试把一个Python模块转换成Cython,这个模块主要负责很多序列化和反序列化的工作。
现在我需要这样做:
import struct
from libc.stdint cimport (
int32_t,
int64_t,
)
cpdef bytes write_int(int32_t i):
return struct.pack("!i", i)
cpdef bytes write_long(int64_t i):
return struct.pack("!q", i)
cdef bytes write_double(double val):
return struct.pack("!d", val)
cdef bytes write_string(bytes val):
cdef int32_t length = len(val)
cdef str fmt
fmt = "!i%ds" % length
return struct.pack(fmt, length, val)
在C语言库中有没有和struct.pack和struct.unpack相同的功能?在Cython中做这些事情的最佳方法是什么?
2 个回答
0
如果你每次只处理一种类型的数据,比如一组 整数
,然后再是一组 浮点数
,那么你可以使用 array.array()
这个方法来提高速度,这个方法可以通过 Python 或 Cython 来实现。
10
我查看了这些模块(这个和这个),然后把代码翻译成Cython,并去掉了PyObject
的部分。理论上来说,这样应该可以工作,但有些部分(比如float
的部分)我没有办法严格测试:
一些导入:
from cpython.array cimport array, clone
from libc.string cimport memcmp, memcpy
from libc.math cimport frexp, ldexp
from libc.stdint cimport int32_t, int64_t
保存一些带有融合类型的代码。虽然从技术上讲这不是一个稳定的特性,但对我来说运行得很好:
ctypedef fused integer:
int32_t
int64_t
这一部分测试了机器的字节序。对我来说是有效的,但这并不是一个完整的测试套件。另一方面,看起来是对的。
cdef enum float_format_type:
unknown_format,
ieee_big_endian_format,
ieee_little_endian_format
# Set-up
cdef array stringtemplate = array('B')
cdef float_format_type double_format
cdef double x = 9006104071832581.0
if sizeof(double) == 8:
if memcmp(&x, b"\x43\x3f\xff\x01\x02\x03\x04\x05", 8) == 0:
double_format = ieee_big_endian_format
elif memcmp(&x, b"\x05\x04\x03\x02\x01\xff\x3f\x43", 8) == 0:
double_format = ieee_little_endian_format
else:
double_format = unknown_format
else:
double_format = unknown_format;
(stringtemplate
是用来快速创建bytes
对象的)
这一部分很简单:
cdef void _write_integer(integer x, char* output):
cdef int i
for i in range(sizeof(integer)-1, -1, -1):
output[i] = <char>x
x >>= 8
cpdef bytes write_int(int32_t i):
cdef array output = clone(stringtemplate, sizeof(int32_t), False)
_write_integer(i, output.data.as_chars)
return output.data.as_chars[:sizeof(int32_t)]
cpdef bytes write_long(int64_t i):
cdef array output = clone(stringtemplate, sizeof(int64_t), False)
_write_integer(i, output.data.as_chars)
return output.data.as_chars[:sizeof(int64_t)]
array
类似于malloc
,但它是自动回收内存的 :)。
这一部分我基本上不太了解。我的“测试”通过了,但更多的是靠运气:
cdef void _write_double(double x, char* output):
cdef:
unsigned char sign
int e
double f
unsigned int fhi, flo, i
char *s
if double_format == unknown_format or True:
if x < 0:
sign = 1
x = -x
else:
sign = 0
f = frexp(x, &e)
# Normalize f to be in the range [1.0, 2.0)
if 0.5 <= f < 1.0:
f *= 2.0
e -= 1
elif f == 0.0:
e = 0
else:
raise SystemError("frexp() result out of range")
if e >= 1024:
raise OverflowError("float too large to pack with d format")
elif e < -1022:
# Gradual underflow
f = ldexp(f, 1022 + e)
e = 0;
elif not (e == 0 and f == 0.0):
e += 1023
f -= 1.0 # Get rid of leading 1
# fhi receives the high 28 bits; flo the low 24 bits (== 52 bits)
f *= 2.0 ** 28
fhi = <unsigned int>f # Truncate
assert fhi < 268435456
f -= <double>fhi
f *= 2.0 ** 24
flo = <unsigned int>(f + 0.5) # Round
assert(flo <= 16777216);
if flo >> 24:
# The carry propagated out of a string of 24 1 bits.
flo = 0
fhi += 1
if fhi >> 28:
# And it also progagated out of the next 28 bits.
fhi = 0
e += 1
if e >= 2047:
raise OverflowError("float too large to pack with d format")
output[0] = (sign << 7) | (e >> 4)
output[1] = <unsigned char> (((e & 0xF) << 4) | (fhi >> 24))
output[2] = 0xFF & (fhi >> 16)
output[3] = 0xFF & (fhi >> 8)
output[4] = 0xFF & fhi
output[5] = 0xFF & (flo >> 16)
output[6] = 0xFF & (flo >> 8)
output[7] = 0xFF & flo
else:
s = <char*>&x;
if double_format == ieee_little_endian_format:
for i in range(8):
output[i] = s[7-i]
else:
for i in range(8):
output[i] = s[i]
如果你能理解它是怎么工作的,记得自己检查一下。
然后我们像之前一样进行封装:
cdef bytes write_double(double x):
cdef array output = clone(stringtemplate, sizeof(double), False)
_write_double(x, output.data.as_chars)
return output.data.as_chars[:sizeof(double)]
字符串的部分其实非常简单,这也解释了我为什么上面这样设置:
cdef bytes write_string(bytes val):
cdef:
int32_t int_length = sizeof(int32_t)
int32_t input_length = len(val)
array output = clone(stringtemplate, int_length + input_length, True)
_write_integer(input_length, output.data.as_chars)
memcpy(output.data.as_chars + int_length, <char*>val, input_length)
return output.data.as_chars[:int_length + input_length]