大数据结构（列表、dict）中的Python内存泄漏原因是什么？

2024-05-23 15:56:11 发布

男 | 程序猿一只，喜欢编程写python代码。

代码非常简单。它不应该有任何泄漏，因为所有操作都是在函数内部完成的。并且不返回。我有一个函数可以遍历文件中的所有行（~20 MiB），并将它们全部放入列表。
提到的功能：

def read_art_file(filename, path_to_dir):
    import codecs
    corpus = []
    corpus_file = codecs.open(path_to_dir + filename, 'r', 'iso-8859-15')
    newline = corpus_file.readline().strip()
    while newline != '':
        # we put into @article a @newline of file and some other info
        # (i left those lists blank for readability)
        article = [newline, [], [], [], [], [], [], [], [], [], [], [], []]
        corpus.append(article)
        del newline
        del article
        newline = corpus_file.readline().strip()
    memory_usage('inside function')
    for article in corpus:
        for word in article:
            del word
        del article
    del corpus
    corpus_file.close()
    memory_usage('inside: after corp deleted')
    return

以下是主要代码：

memory_usage('START')
path_to_dir = '/home/soshial/internship/training_data/parser_output/'
read_art_file('accounting.n.txt.wpr.art', path_to_dir)
memory_usage('outside func')
time.sleep(5)
memory_usage('END')

所有memory_usage只打印脚本分配的KiB量。

执行脚本

如果我运行脚本，它会给我：

START memory: 6088 KiB
inside memory: 393752 KiB (20 MiB file + lists occupy 400 MiB)
inside: after corp deleted memory: 43360 KiB
outside func memory: 34300 KiB (34300-6088= 28 MiB leaked)
FINISH memory: 34300 KiB

无列表执行

如果我做了同样的事情，但是在corpus后面加上article注释：

article = [newline, [], [], [], [], [], ...]  # we still assign data to `article`
# corpus.append(article)  # we don't have this string during second execution

这样输出给我：

START memory: 6076 KiB
inside memory: 6076 KiB
inside: after corp deleted memory: 6076 KiB
outside func memory: 6076 KiB
FINISH memory: 6076 KiB

问题：

因此，这样就释放了所有内存。我需要释放所有内存，因为我要处理数百个这样的文件。
是我做错了什么，还是CPython解释器的错误？

升级版。这是我检查内存消耗的方式（从其他stackoverflow问题中获取）：

def memory_usage(text = ''):
    """Memory usage of the current process in kilobytes."""
    status = None
    result = {'peak': 0, 'rss': 0}
    try:
        # This will only work on systems with a /proc file system
        # (like Linux).
        status = open('/proc/self/status')
        for line in status:
            parts = line.split()
            key = parts[0][2:-1].lower()
            if key in result:
                result[key] = int(parts[1])
    finally:
        if status is not None:
            status.close()
    print('>', text, 'memory:', result['rss'], 'KiB  ')
    return

Tags： to path in status dir article newline usage

0条回答

目前没有回答

大数据结构（列表、dict）中的Python内存泄漏原因是什么？

执行脚本

无列表执行

问题：

相关问题更多 >

编程相关推荐

热门问题

热门文章

大数据结构（列表、dict）中的Python内存泄漏原因是什么？

执行脚本

无列表执行

问题：

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >