从文本文件中获取前5个元素，其中存储了数十亿个元素，而不将它们存储在variab中 - 问答 - Python中文网

从文本文件中获取前5个元素，其中存储了数十亿个元素，而不将它们存储在variab中

2024-04-26 20:18:22 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

从存储了数十亿个数字的文本文件中查找前5个数字的列表。数字不是分隔的就是换行的。由于内存问题，我无法将列表的内容存储在变量中。你知道吗
我已经使用发电机，并给予批量大小为5。所以每次调用next(result_generator)时，我都会从文本文件中得到5个元素。你知道吗
第一次调用next(result_generator)时，我将得到5个元素并对它们进行排序。我将它们视为前5个元素。你知道吗
下次调用next(result_generator)时，我将得到另一个5。我将把它与前面的5结合起来。我会把它分类，然后从这10个中选出前5个。你知道吗
类似地，取下一个5并与前一个5组合以获得前50名，直到它next(result_generator)返回None。你知道吗

我面临的问题是发电机不能正常工作，它不能在接下来的5分钟内工作元素。它在第二次调用next(result_generator)时将发生异常。我试着用数据库做同样的事情，它在那里工作得很好。我怀疑文件操作有问题。我使用随机函数来生成数字，并将其写入文本文件中作为示例输入。你知道吗

在文本文件中生成随机数的代码：

count =500
f = open('billion.txt','w')
while(count >1):
     a = random.randint(1, 1000)
     f.write(str(a)+"\n")
     count-=1
f.close()

从文本文件中查找前5个元素的代码：

result = []
full_list = []
final_list = []
def result_generator(batchsize=5):
    while True:
        global result
        global full_list
        global final_list
        result = sorted([int(next(myfile).rstrip()) for x in range(batchsize)], reverse=True)
        final_list = sorted(full_list + result, reverse=True)[:5]
        full_list = result.copy()
        # print("result list is : {}".format(final_list))
        if not final_list:
            break
        else:
            yield final_list


with open("billion.txt") as myfile:
    result = result_generator()
    print("datatype is :", type(result))
    print("result is ",next(result))
    for i in range (0,2):
        try:
            for each in next(result):
                print("Row {} is :".format(each))
        except StopIteration:
            print("stop iteration")
        except Exception:
            print("Some different issue")

例如

131205,65,55222278672902,69,26………十亿

预期结果：[902,672,278,222,205]
实际结果：[222,205,131,65,55]

Tags： true 元素 for is count 数字 result generator

1条回答

网友

1楼 · 发布于 2024-04-26 20:18:22

为什么不用heapq

像这样的文件文件.txt你知道吗

你可以正常地迭代你的文件

import heapq

data = []
heapq.heapify(data)
N = 5

result = []
# Assuming numbers are each on a new line
with open('file.txt', 'r') as f:
    for line in f:
        heapq.heappush(data, int(line.strip()))
        if len(data) > N:
            heapq.heappop(data)
while data:
    result.append(heapq.heappop(data))

result.reverse()
print(result)

[902, 672, 278, 222, 205]

你将使用O（N）内存和O（MlogN）时间，其中M是你的问题的数十亿，N是你想要得到的顶数

相关问题更多 >

编程相关推荐

热门问题

热门文章