Python中的ctypes与memset导致崩溃

6 投票

1 回答

3988 浏览

提问于 2025-04-17 20:01

我正在尝试从内存中删除密码字符串，就像这里建议的那样。

我写了这段小代码：

import ctypes, sys

def zerome(string):
    location = id(string) + 20
    size     = sys.getsizeof(string) - 20
    #memset =  ctypes.cdll.msvcrt.memset
    # For Linux, use the following. Change the 6 to whatever it is on your computer.
    print ctypes.string_at(location, size)
    memset =  ctypes.CDLL("libc.so.6").memset
    memset(location, 0, size)
    print "Clearing 0x%08x size %i bytes" % (location, size)
    print ctypes.string_at(location, size)

a = "asdasd"

zerome(a)

奇怪的是，这段代码在IPython中运行得很好，

[7] oz123@yenitiny:~ $ ipython a.py 
Clearing 0x02275b84 size 23 bytes

但在Python中却崩溃了：

[8] oz123@yenitiny:~ $ python a.py 
Segmentation fault
[9] oz123@yenitiny:~ $

有人知道为什么吗？

我在Debian Wheezy上测试，使用的是Python 2.7.3。

小更新...

这段代码在CentOS 6.2上使用Python 2.6.6时可以正常工作。但在Debian上使用Python 2.6.8时崩溃了。我试着想为什么在CentOS上能工作，而在Debian上却不行。唯一的明显不同是，我的Debian是多架构的，而CentOS是在我的老笔记本上运行，那个笔记本的CPU是i686。

因此，我重启了我的CentOS笔记本，并在上面加载了Debian Wheezy。这段代码在不支持多架构的Debian Wheezy上可以正常工作。所以，我怀疑我的Debian配置可能有些问题...

centos debian ipython ctypes string handling memory management segmentation fault multi-architecture

1 个回答

ctypes已经有一个叫做memset的函数，所以你不需要为libc/msvcrt函数创建一个函数指针。而且，20字节是针对常见的32位平台来说的。在64位系统上，这个大小可能是36字节。下面是PyStringObject的结构：

typedef struct {
    Py_ssize_t ob_refcnt;         // 4|8 bytes
    struct _typeobject *ob_type;  // 4|8 bytes
    Py_ssize_t ob_size;           // 4|8 bytes
    long ob_shash;                // 4|8 bytes (4 on 64-bit Windows)
    int ob_sstate;                // 4 bytes
    char ob_sval[1];
} PyStringObject;

在32位系统上，它可能是5*4 = 20字节，而在64位Linux上则是8*4 + 4 = 36字节，或者在64位Windows上是8*3 + 4*2 = 32字节。因为字符串没有垃圾回收的头信息，所以你可以使用sys.getsizeof。一般来说，如果你不想计算垃圾回收头的大小（在内存中，这个头信息实际上是在你从id得到的对象基地址之前），那么可以使用对象的__sizeof__方法。至少在我的经验中，这是一个通用的规则。

你想要做的就是简单地从对象的大小中减去缓冲区的大小。CPython中的字符串是以空字符结束的，所以只需在它的长度上加1，就可以得到缓冲区的大小。例如：

>>> a = 'abcdef'
>>> bufsize = len(a) + 1
>>> offset = sys.getsizeof(a) - bufsize
>>> ctypes.memset(id(a) + offset, 0, bufsize)
3074822964L
>>> a
'\x00\x00\x00\x00\x00\x00'

编辑

一个更好的选择是定义PyStringObject结构。这使得检查ob_sstate变得方便。如果它大于0，说明这个字符串是被内存管理的，正确的做法是抛出一个异常。单字符字符串会被内存管理，还有那些只包含ASCII字母和下划线的字符串常量，以及解释器内部用于名称（变量名、属性名）的字符串。

from ctypes import *

class PyStringObject(Structure):
    _fields_ = [
      ('ob_refcnt', c_ssize_t),
      ('ob_type', py_object),
      ('ob_size', c_ssize_t),
      ('ob_shash', c_long),
      ('ob_sstate', c_int),
      # ob_sval varies in size
      # zero with memset is simpler
    ]

def zerostr(s):
    """zero a non-interned string"""
    if not isinstance(s, str):
        raise TypeError(
          "expected str object, not %s" % type(s).__name__)

    s_obj = PyStringObject.from_address(id(s))
    if s_obj.ob_sstate > 0:
        raise RuntimeError("cannot zero interned string")

    s_obj.ob_shash = -1  # not hashed yet
    offset = sizeof(PyStringObject)
    memset(id(s) + offset, 0, len(s))

例如：

>>> s = 'abcd' # interned by code object
>>> zerostr(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 10, in zerostr
RuntimeError: cannot zero interned string

>>> s = raw_input() # not interned
abcd
>>> zerostr(s)
>>> s
'\x00\x00\x00\x00'

回答于 2025-04-17 由 Python大师

分享举报

Python中的ctypes与memset导致崩溃

小更新...

1 个回答

撰写回答