pymongo如何快速上传20GB文件到mongoDB

2024-04-25 19:38:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在python中尝试pool():

import pandas as pd
import csv
from pymongo import MongoClient
import codecs
import re 
import codecs
from multiprocessing.dummy import Pool as ThreadPool 

# read csv file and put the file into database using pymongo
mongo_client = MongoClient('111.111.11.111',maxPoolSize=200) 
db = mongo_client.mydb

def upload(reader):
    for each in reader:
        row={}
        txt = re.split(" ",str(each))
        row["time"] = re.split("'",txt[0])[1]
        row["ticker"] = txt[1]
        row["price"] = re.split("\((.*?)\)", txt[2])[1]
        row["open"] = re.split("\((.*?)\)", txt[3])[1]
        db.price.insert_one(row)

pool = ThreadPool(10) 
results = pool.map(upload, csv.reader(codecs.open('C:\\log.txt', 'rU', 'utf-16')))  

我们的想法是把大的日志.txt文件分成10个文件池,并并行运行以优化速度。但是数据库中没有任何更新,这意味着我的代码不能正常工作。这里怎么了?(我确信问题不在于上传函数,因为如果我不使用pool()运行,它会很好地工作)


Tags: csvfromimportretxtasreaderfile