如何使用ctypes从字节数据中malloc动态缓冲区?

2024-04-27 06:44:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我在ctypes中找到的每个创建缓冲区的引用似乎都创建了一个静态长度…
在这里,我处理从ctypes处理的文件中读取的数据,ctypes在结构中定义内联缓冲区,在读取之前,该结构的长度最初是未知的

import ctypes

class Buffer16(ctypes.Structure):
    _fields_ = [
        ('length', ctypes.c_ushort.__ctype_be__ ),
        ('data', ctypes.c_ubyte*0 ) # to be resized via malloc
    ]

    def __new__(cls): # not executed for some reason
        b16 = ctypes.Structure.__new__(cls) # wish I could interrupt before reading the 0-length array...
        # some unknown magic here to malloc b16.data
        return b16

class Test(ctypes.Structure):
    _fields_ = [
        ('data', ctypes.c_uint.__ctype_be__ ),
        ('buf1', Buffer16 ),
        ('buf2', Buffer16 )
    ]

我可以轻松地将数据定义为从文件中读取的多维数组,并使用Structure.from_address(ctypes.addressof(bytedata))
初始化结构 但是这里的问题是__new____init__没有执行,因此缓冲区的大小不合适

以下是一些测试数据作为示例:

>>> bytedata = (ctypes.c_ubyte*19)(*b'\x00\x04\x18\x80\x00\x04test\x00\x07testing')
>>> 
>>> testinstance = Test.from_address(ctypes.addressof(bytedata))
>>> testinstance.data # just some dummy data which is correct
268416
>>> testinstance.buf1.length # this is correct
4
>>> testinstance.buf1.data # this should be __len__ == 4
<__main__.c_ubyte_Array_0 object at 0x...>
>>> testinstance.buf2.length # this is wrong (0x7465 from b'te'), it should be 7
29797

有没有比from_address更好的方法来内联malloc?
(除testinstance[0]之外,铸造与from_address没有区别)


Tags: fromnewdataaddresssomebectypesstructure
2条回答

得益于马克·托洛宁的回答和他的灵感,我意识到他的回答与ctypes.Structure.from_address()方法类似

以下是我的答案和测试,以及我对his的更新:

from ctypes import Structure, c_char, c_ushort, c_uint, POINTER, addressof

c_bushort = c_ushort.__ctype_be__
c_buint = c_uint.__ctype_be__

class Buffer16(Structure):
    _fields_ = (
        ('length', c_bushort),
        ('data', POINTER( c_char ))
    )

    @classmethod
    def from_address(cls, addr):
        length = c_bushort.from_address( addr ).value
        data   = ( c_char*length ).from_address( addr+2 )
        return cls( length, data )

class Test(Structure):
    _fields_ = (
        ('data', c_buint),
        ('buf1', Buffer16),
        ('buf2', Buffer16)
    )

    @classmethod
    def from_address(cls, addr):
        data = c_buint.from_address( addr )
        b1   = Buffer16.from_address( addr+4 )
        b2   = Buffer16.from_address( addr+6+b1.length )
        return cls( data, b1, b2 )

bytedata = ( c_char*19 )( *b'\x00\x04\x18\x80\x00\x04test\x00\x07testing' )
t = Test.from_address( addressof( bytedata ) )

print( t.data )
print( t.buf1.data[:t.buf1.length] )
print( t.buf2.data[:t.buf2.length] )

结果是:

>>>
268416
b'test'
b'testing'

还有一个关于在{}和{}上强制执行{}的小注释

并非所有系统在读取数据时都使用相同的默认endian。

我的系统尤其以小尾端读取数据,因此b'\x00\x04\x18\x80'在使用ctypes.c_uint处理时返回2149057536,而不是预期的268416

您的结构中有可变大小的数据。您将如何在C中创建此结构?通常,只有结构中的最后一个元素可以是数组,C允许在结构末尾之外有一个索引,但在这种情况下,您有两个变量

虽然可以在ctypes中完成,但我首先建议在使用struct模块时解压缩数据。如果您正在从文件读取数据,那么您真正关心的就是获取数据和缓冲区,它不需要采用ctypes格式,也不需要超出其使用范围的长度来读取缓冲区:

import struct
import io

# create a file-like byte stream
filedata = io.BytesIO(b'\x00\x04\x18\x80\x00\x04test\x00\x07testing')

data,len1 = struct.unpack('>LH',filedata.read(6))
data1 = filedata.read(len1)
len2, = struct.unpack(f'>H',filedata.read(2))
data2 = filedata.read(len2)
print(hex(data),data1,data2)

输出:

0x41880 b'test' b'testing'

这里有一种在ctypes中实现的方法,为每个结构创建一个自定义类定义,但是数据真的需要ctypes格式吗

import struct
import ctypes
import io

# Read a variable-sized Buffer16 object from the file.
# Once the length is read, declare a custom class with data of that length.
def read_Buffer16(filedata):
    length, = struct.unpack('>H',filedata.read(2))
    class Buffer16(ctypes.BigEndianStructure):
        _fields_ = (('length', ctypes.c_ushort),
                    ('data', ctypes.c_char * length))
        def __repr__(self):
            return f'Buffer16({self.length}, {self.data})'
    return Buffer16(length,filedata.read(length))

# Read a variable-sized Test object from the file.
# Once the buffers are read, declare a custom class of their exact type.
def read_Test(filedata):
    data, = struct.unpack('>L',filedata.read(4))
    b1 = read_Buffer16(filedata)
    b2 = read_Buffer16(filedata)
    class Test(ctypes.BigEndianStructure):
        _fields_ = (('data', ctypes.c_uint),
                    ('buf1', type(b1)),
                    ('buf2', type(b2)))
        def __repr__(self):
            return f'Test({self.data:#x}, {self.buf1}, {self.buf2})'
    return Test(data,b1,b2)

# create a file-like byte stream
filedata = io.BytesIO(b'\x00\x04\x18\x80\x00\x04test\x00\x07testing')

t = read_Test(filedata)
print(t)

输出:

Test(0x41880, Buffer16(4, b'test'), Buffer16(7, b'testing'))
每个评论编辑

Edit

这可能是将此文件数据存储在类似C的结构中的方式。变量缓冲区被读入,存储在数组中(类似于C malloc),其长度和地址存储在结构中。类方法知道如何从文件流中读取特定结构并返回适当的对象。然而,请注意,就像在C中一样,您可以读取超过指针末尾的内容,并冒异常或未定义行为的风险

import struct
import ctypes
import io

class Buffer16(ctypes.Structure):
    _fields_ = (('length', ctypes.c_ushort),
                ('data', ctypes.POINTER(ctypes.c_char)))

    @classmethod
    def read(cls,file):
        length, = struct.unpack('>H',file.read(2))
        data = (ctypes.c_char * length)(*file.read(length))
        return cls(length,data)

    def __repr__(self):
        return f'Buffer16({self.data[:self.length]})'

class Test(ctypes.Structure):
    _fields_ = (('data', ctypes.c_uint),
                ('buf1', Buffer16),
                ('buf2', Buffer16))

    @classmethod
    def read(cls,file):
        data, = struct.unpack('>L',file.read(4))
        b1 = Buffer16.read(file)
        b2 = Buffer16.read(file)
        return cls(data,b1,b2)

    def __repr__(self):
        return f'Test({self.data:#x}, {self.buf1}, {self.buf2})'

# create a file-like byte stream
file = io.BytesIO(b'\x00\x04\x18\x80\x00\x04test\x00\x07testing')

t = Test.read(file)
print(t)
print(t.buf1.length)
print(t.buf1.data[:10]) # Just like in C, you can read beyond the end of the pointer

输出:

Test(0x41880, Buffer16(b'test'), Buffer16(b'testing'))
4
b'test\x00\x00\x00\x00\x00\x00'

相关问题 更多 >