我一直在阅读以下文件:
file = open(fullpath, "r")
allrecords = file.read()
delimited = allrecords.split('\n')
for record in delimited[1:]:
record_split = record.split(',')
以及
^{pr2}$但当我在多处理线程中处理这些文件时,我会得到内存错误。当我正在读取的文本文件需要在'\n'
上拆分时,如何才能最好地逐行读入文件。在
下面是多处理代码:
pool = Pool()
fixed_args = (targetdirectorytxt, value_dict)
varg = ((filename,) + fixed_args for filename in readinfiles)
op_list = pool.map_async(PPD_star, list(varg), chunksize=1)
while not op_list.ready():
print("Number of files left to process: {}".format(op_list._number_left))
time.sleep(60)
op_list = op_list.get()
pool.close()
pool.join()
这是错误日志
Exception in thread Thread-3:
Traceback (most recent call last):
File "C:\Python27\lib\threading.py", line 810, in __bootstrap_inner
self.run()
File "C:\Python27\lib\threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "C:\Python27\lib\multiprocessing\pool.py", line 380, in _handle_results
task = get()
MemoryError
我正试着按照迈克的建议安装pathos,但我遇到了一些问题。下面是我的安装命令:
pip install https://github.com/uqfoundation/pathos/zipball/master --allow-external pathos --pre
但以下是我收到的错误消息:
Downloading/unpacking https://github.com/uqfoundation/pathos/zipball/master
Running setup.py (path:c:\users\xxx\appdata\local\temp\2\pip-1e4saj-b
uild\setup.py) egg_info for package from https://github.com/uqfoundation/pathos/
zipball/master
Downloading/unpacking ppft>=1.6.4.5 (from pathos==0.2a1.dev0)
Running setup.py (path:c:\users\xxx\appdata\local\temp\2\pip_build_jp
tyuser\ppft\setup.py) egg_info for package ppft
warning: no files found matching 'python-restlib.spec'
Requirement already satisfied (use --upgrade to upgrade): dill>=0.2.2 in c:\pyth
on27\lib\site-packages\dill-0.2.2-py2.7.egg (from pathos==0.2a1.dev0)
Requirement already satisfied (use --upgrade to upgrade): pox>=0.2.1 in c:\pytho
n27\lib\site-packages\pox-0.2.1-py2.7.egg (from pathos==0.2a1.dev0)
Downloading/unpacking pyre==0.8.2.0-pathos (from pathos==0.2a1.dev0)
Could not find any downloads that satisfy the requirement pyre==0.8.2.0-pathos
(from pathos==0.2a1.dev0)
Some externally hosted files were ignored (use --allow-external pyre to allow)
.
Cleaning up...
No distributions at all found for pyre==0.8.2.0-pathos (from pathos==0.2a1.dev0)
Storing debug log for failure in C:\Users\xxx\pip\pip.log
我正在Windows 7 64位上安装。最后,我成功地用easy-tu安装了它。在
但现在我有一个失败,因为我无法打开那么多文件:
Finished reading in Exposures...
Reading Samples from: C:\XXX\XXX\XXX\
Traceback (most recent call last):
File "events.py", line 568, in <module>
mdrcv_dict = ReadDamages(damage_dir, value_dict)
File "events.py", line 185, in ReadDamages
res = thpool.amap(mppool.map, [rstrip]*len(readinfiles), files)
File "C:\Python27\lib\site-packages\pathos-0.2a1.dev0-py2.7.egg\pathos\multipr
ocessing.py", line 230, in amap
return _pool.map_async(star(f), zip(*args)) # chunksize
File "events.py", line 184, in <genexpr>
files = (open(name, 'r') for name in readinfiles[0:])
IOError: [Errno 24] Too many open files: 'C:\\xx.csv'
我正在打开我的字典,并将参数映射到一个函数库中,然后将其传递到一个函数库中。这是一个我目前如何做的例子,如何聪明的方法来做这与悲情?在
def PP_star(args_flat):
return PP(*args_flat)
def PP(pathfilename, txtdatapath, my_dict):
return com_dict
fixed_args = (targetdirectorytxt, my_dict)
varg = ((filename,) + fixed_args for filename in readinfiles)
op_list = pool.map_async(PP_star, list(varg), chunksize=1)
如何使用pathos.multiprocessing
执行相同的功能
只需迭代行,而不是读取整个文件。 像这样
试试这个:
当然,您也可以将它们添加到列表或对其执行其他操作,而不是打印它们
假设我们有file1.txt:
file2.txt:
^{pr2}$以此类推,通过file5.txt:
我建议使用分层并行
map
快速读取文件。 一个multiprocessing
(称为pathos.multiprocessing
)的分叉可以做到这一点。在但是,如果您想检查还有多少文件要完成,您可能需要使用“迭代”映射(
imap
),而不是“异步”映射(amap
)。详情请参阅此帖:Python multiprocessing - tracking the process of pool.map operation获取
pathos
此处:https://github.com/uqfoundation相关问题 更多 >
编程相关推荐