使用ctypes调用Rust FFI的Python在出口处崩溃,并显示“未分配正在释放的指针”

2024-04-27 09:48:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图释放分配给CString并使用ctypes传递给Python的内存。但是,Python由于malloc错误而崩溃:

python(30068,0x7fff73f79000) malloc: *** error for object 0x103be2490: pointer being freed was not allocated 

下面是我用来将指针传递到ctypes的Rust函数:

^{pr2}$

我用它来释放Python返回的指针

pub extern "C" fn drop_cstring(p: *mut c_char) {
    unsafe { CString::from_raw(p) };
}

以及我用来转换指向str的指针的Python函数:

def char_array_to_string(res, _func, _args):
    """ restype is c_void_p to prevent automatic conversion to str
    which loses pointer access

    """
    converted = cast(res, c_char_p)
    result = converted.value
    drop_cstring(converted)
    return result

以及我用来生成Array结构以传递给Rust的Python函数:

class _FFIArray(Structure):
    """
    Convert sequence of float lists to a C-compatible void array
    example: [[1.0, 2.0], [3.0, 4.0]]

    """
    _fields_ = [("data", c_void_p),
                ("len", c_size_t)]

    @classmethod
    def from_param(cls, seq):
        """  Allow implicit conversions """
        return seq if isinstance(seq, cls) else cls(seq)

    def __init__(self, seq, data_type = c_double):
        arr = ((c_double * 2) * len(seq))()
        for i, member in enumerate(seq):
            arr[i][0] = member[0]
            arr[i][1] = member[1]
        self.data = cast(arr, c_void_p)
        self.len = len(seq)

{cd4{cd5}定义:

encode_coordinates = lib.encode_coordinates_ffi
encode_coordinates.argtypes = (_FFIArray,)
encode_coordinates.restype = c_void_p
encode_coordinates.errcheck = char_array_to_string

drop_cstring = lib.drop_cstring
drop_cstring.argtypes = (c_char_p,)
drop_cstring.restype = None

我倾向于认为这不是Rust函数,因为dylib崩溃会导致segfault(而FFI测试在Rust端通过)。在调用FFI函数之后,我还可以继续使用Python中的其他操作——当进程退出时,会发生malloc错误。在


Tags: to函数lendefrustseqdropencode
2条回答

多亏了J.J. Hakala's answer的努力,我得以在纯锈中生产出MCVE

extern crate libc;

use std::ffi::CString;
use libc::c_void;

fn encode_coordinates(coordinates: &Vec<[f64; 2]>) -> String {
    format!("Encoded coordinates {:?}", coordinates)
}

struct Array {
    data: *const c_void,
    len: libc::size_t,
}

impl From<Array> for Vec<[f64; 2]> {
    fn from(arr: Array) -> Self {
        unsafe { Vec::from_raw_parts(arr.data as *mut [f64; 2], arr.len, arr.len) }
    }
}

impl From<Array> for String {
    fn from(incoming: Array) -> String {
        encode_coordinates(&incoming.into())
    }
}

fn encode_coordinates_ffi(coords: Array) -> CString {
    CString::new(String::from(coords)).unwrap()
}

fn main() {
    for _ in 0..10 {
        let i_own_this = vec![[1.0, 2.0], [3.0, 4.0]];

        let array = Array {
            data: i_own_this.as_ptr() as *const _,
            len: i_own_this.len(),
        };

        println!("{:?}", encode_coordinates_ffi(array))
    }
}

打印:

^{pr2}$

主要问题是:

impl From<Array> for Vec<[f64; 2]> {
    fn from(arr: Array) -> Self {
        unsafe { Vec::from_raw_parts(arr.data as *mut [f64; 2], arr.len, arr.len) }
    }
}

让我们看看documentation for ^{}

This is highly unsafe, due to the number of invariants that aren't checked:

  • ptr needs to have been previously allocated via String/Vec<T> (at least, it's highly likely to be incorrect if it wasn't).
  • length needs to be the length that less than or equal to capacity.
  • capacity needs to be the capacity that the pointer was allocated with.

Violating these may cause problems like corrupting the allocator's internal datastructures.

但是,如图所示的原始代码违反了第一点——指针是由malloc分配的。在

为什么这会起作用?当您调用Vec::from_raw_parts时,它将获得指针的所有权。当Vec超出作用域时,指向的内存将被释放。这意味着您试图多次释放该指针。在

因为函数的安全性由传入的内容决定,entire function should be marked ^{}。在本例中,这将违反trait的接口,因此您需要将其移到其他地方。在

更明智的是,您可以将Array转换为一个切片。这仍然是不安全的,因为它依赖于传入的指针,但它不拥有底层指针。然后,您可以将该片分成Vec,分配新的内存并复制内容。在

由于您控制着encode_coordinates,因此还应该更改其签名。&Vec<T>在99.99%的情况下是无用的,并且可能实际上效率更低:它需要两个指针的解引用,而不是一个。相反,接受&[T]。这允许传递更广泛的类型,包括数组和Vecs

我认为代码的Rust端假定数据的所有权,并在进程退出时尝试释放数据,因此Python代码不应受到指责。作为证明,下面调用encode_coordinates_ffidrop_cstring的C代码也会导致分段错误。在

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

typedef struct {
    double longitude;
    double latitude;
} coord_t;

typedef struct {
    coord_t * data;
    size_t count;
} points_t;

char * encode_coordinates_ffi(points_t points);
void drop_cstring(void * str);

int main(void)
{
   points_t data;
   coord_t * points;
   char * res;

   data.data = malloc(sizeof(coord_t) * 2);
   data.count = 2;
   points = (coord_t *)data.data;

   points[0].latitude = 1.0;
   points[0].longitude = 2.0;
   points[1].latitude = 3.0;
   points[1].longitude = 4.0;

   res = encode_coordinates_ffi(data);
   printf("%s\n", res);
   free(data.data);
   drop_cstring(res);

   return 0;
}

valgrind -v给出了以下信息

^{pr2}$

如果忽略了这个free(data.data),程序将在没有分段错误的情况下完成,valgrind也不会发现任何内存泄漏。在

我将尝试实现接口,以便它对应于

typedef struct {
    double longitude;
    double latitude;
} coord_t;

int coordinates_ffi(char * dst, size_t n, coord_t * points, size_t npoints);

其中dst将用于编码字符串(长度限制n,基于坐标数目的近似值npoints),因此调用者不需要解除分配Rust字符串。在

相关问题 更多 >