从文件中访问类似行并应用函数

2条回答

网友

1楼 · 编辑于 2024-05-14 16:20:54

以下应该可以实现所需的功能

from collections import Counter

output = Counter()

with open("input.txt") as file:
    for line in file.read().split('\n'):
        if line:
            key, value = line.split()

            output[key] += int(value)

with open("output.txt", 'w+') as file:
    for key, value in output.items():
        file.write("{key} {value}\n".format(key=key, value=value))

网友

2楼 · 编辑于 2024-05-14 16:20:54

It is a big text file >20GB. So I cannot store the whole thing into memory at once.

不管文件有多大。重要的是有多少独特的记录，因为你将只保留独特的记录
PythonCounter仍会将其保存在内存中。如果你在一个受限的环境中运行，这对你没有任何好处

我的建议是：

按字母顺序对文件排序。我只想通过unix sort发送它(我假设您的FS上有空间）
迭代行。提取当前记录的第一部分。当记录的第一部分与第二部分求和时进行迭代
当记录类型改变时，用你一直保存在内存中的总和在文件中写一行
重复

相关问题更多 >

编程相关推荐

热门问题

热门文章

从文件中访问类似行并应用函数

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >