在不耗尽内存的情况下使用并发期货

from concurrent import futures with futures.ProcessPoolExecutor(max_workers=6) as executor: # A dictionary which will contain a list the future info in the key, and the filename in the value jobs = {} # Loop through the files, and run the parse function for each file, sending the file-name to it. # The results of can come back in any order. for this_file in files_list: job = executor.submit(parse_function, this_file, **parser_variables) jobs[job] = this_file # Get the completed jobs whenever they are done for job in futures.as_completed(jobs): # Send the result of the file the job is based on (jobs[job]) and the job (job.result) results_list = job.result() this_file = jobs[job] # delete the result from the dict as we don't need to store it. del jobs[job] # post-processing (putting the results into a database) post_process(this_file, results_list)

2条回答

网友

1楼 · 编辑于 2024-04-25 00:39:06

您可以尝试像这样将del添加到代码中

for job in futures.as_completed(jobs):
    del jobs[job]
    del job #or job._result = None

网友

2楼 · 编辑于 2024-04-25 00:39:06

我来试试（可能猜错了…）

您可能需要一点一点地提交您的工作，因为每次提交时，您都会生成parser_变量的副本，这可能最终会占用您的RAM。在

下面是在有趣的部分使用“<；”的工作代码

with futures.ProcessPoolExecutor(max_workers=6) as executor:
    # A dictionary which will contain a list the future info in the key, and the filename in the value
    jobs = {}

    # Loop through the files, and run the parse function for each file, sending the file-name to it.
    # The results of can come back in any order.
    files_left = len(files_list) #<  
    files_iter = iter(files_list) #<   

    while files_left:
        for this_file in files_iter:
            job = executor.submit(parse_function, this_file, **parser_variables)
            jobs[job] = this_file
            if len(jobs) > MAX_JOBS_IN_QUEUE:
                break #limit the job submission for now job

        # Get the completed jobs whenever they are done
        for job in futures.as_completed(jobs):

            files_left -= 1 #one down - many to go...   < -

            # Send the result of the file the job is based on (jobs[job]) and the job (job.result)
            results_list = job.result()
            this_file = jobs[job]

            # delete the result from the dict as we don't need to store it.
            del jobs[job]

            # post-processing (putting the results into a database)
            post_process(this_file, results_list)
            break; #give a chance to add more jobs <  -

相关问题更多 >

编程相关推荐

热门问题

热门文章