在python中高效地读取文件，需要在'\n'上拆分

pool = Pool() fixed_args = (targetdirectorytxt, value_dict) varg = ((filename,) + fixed_args for filename in readinfiles) op_list = pool.map_async(PPD_star, list(varg), chunksize=1) while not op_list.ready(): print("Number of files left to process: {}".format(op_list._number_left)) time.sleep(60) op_list = op_list.get() pool.close() pool.join()

Exception in thread Thread-3: Traceback (most recent call last): File "C:\Python27\lib\threading.py", line 810, in __bootstrap_inner self.run() File "C:\Python27\lib\threading.py", line 763, in run self.__target(*self.__args, **self.__kwargs) File "C:\Python27\lib\multiprocessing\pool.py", line 380, in _handle_results task = get() MemoryError

Downloading/unpacking https://github.com/uqfoundation/pathos/zipball/master Running setup.py (path:c:\users\xxx\appdata\local\temp\2\pip-1e4saj-b uild\setup.py) egg_info for package from https://github.com/uqfoundation/pathos/ zipball/master Downloading/unpacking ppft>=1.6.4.5 (from pathos==0.2a1.dev0) Running setup.py (path:c:\users\xxx\appdata\local\temp\2\pip_build_jp tyuser\ppft\setup.py) egg_info for package ppft warning: no files found matching 'python-restlib.spec' Requirement already satisfied (use --upgrade to upgrade): dill>=0.2.2 in c:\pyth on27\lib\site-packages\dill-0.2.2-py2.7.egg (from pathos==0.2a1.dev0) Requirement already satisfied (use --upgrade to upgrade): pox>=0.2.1 in c:\pytho n27\lib\site-packages\pox-0.2.1-py2.7.egg (from pathos==0.2a1.dev0) Downloading/unpacking pyre==0.8.2.0-pathos (from pathos==0.2a1.dev0) Could not find any downloads that satisfy the requirement pyre==0.8.2.0-pathos (from pathos==0.2a1.dev0) Some externally hosted files were ignored (use --allow-external pyre to allow) . Cleaning up... No distributions at all found for pyre==0.8.2.0-pathos (from pathos==0.2a1.dev0) Storing debug log for failure in C:\Users\xxx\pip\pip.log

Finished reading in Exposures... Reading Samples from: C:\XXX\XXX\XXX\ Traceback (most recent call last): File "events.py", line 568, in <module> mdrcv_dict = ReadDamages(damage_dir, value_dict) File "events.py", line 185, in ReadDamages res = thpool.amap(mppool.map, [rstrip]*len(readinfiles), files) File "C:\Python27\lib\site-packages\pathos-0.2a1.dev0-py2.7.egg\pathos\multipr ocessing.py", line 230, in amap return _pool.map_async(star(f), zip(*args)) # chunksize File "events.py", line 184, in <genexpr> files = (open(name, 'r') for name in readinfiles[0:]) IOError: [Errno 24] Too many open files: 'C:\\xx.csv'

def PP_star(args_flat): return PP(*args_flat) def PP(pathfilename, txtdatapath, my_dict): return com_dict fixed_args = (targetdirectorytxt, my_dict) varg = ((filename,) + fixed_args for filename in readinfiles) op_list = pool.map_async(PP_star, list(varg), chunksize=1)

3条回答

网友

1楼 · 编辑于 2024-06-07 07:58:36

只需迭代行，而不是读取整个文件。像这样

with open(os.path.join(txtdatapath,pathfilename), "r") as data:
    for dataline in data:
        split_line = record.split(',')
        if len(split_line) > 1:

网友

2楼 · 编辑于 2024-06-07 07:58:36

试试这个：

for line in file('file.txt'):
    print line.rstrip()

当然，您也可以将它们添加到列表或对其执行其他操作，而不是打印它们

网友

3楼 · 编辑于 2024-06-07 07:58:36

假设我们有file1.txt：

file2.txt：

^{pr2}$

以此类推，通过file5.txt：

我建议使用分层并行map快速读取文件。一个multiprocessing（称为pathos.multiprocessing）的分叉可以做到这一点。在

>>> import pathos
>>> thpool = pathos.multiprocessing.ThreadingPool()
>>> mppool = pathos.multiprocessing.ProcessingPool()
>>> 
>>> def rstrip(line):
...     return line.rstrip()
... 
# get your list of files
>>> fnames = ['file1.txt', 'file2.txt', 'file3.txt', 'file4.txt', 'file5.txt']
>>> # open the files
>>> files = (open(name, 'r') for name in fnames)
>>> # read each file in asynchronous parallel
>>> # while reading and stripping each line in parallel
>>> res = thpool.amap(mppool.map, [rstrip]*len(fnames), files)
>>> # get the result when it's done
>>> res.ready()
True
>>> data = res.get()
>>> # if not using a files iterator   close each file by uncommenting the next line
>>> # files = [file.close() for file in files]
>>> data[0]
['hello35', '1234123', '1234123', 'hello32', '2492wow', '1234125', '1251234', '1234123', '1234123', '2342bye', '1234125', '1251234', '1234123', '1234123', '1234125', '1251234', '1234123']
>>> data[1]
['1234125', '1251234', '1234123', 'hello35', '2492wow', '1234125', '1251234', '1234123', '1234123', 'hello32', '1234125', '1251234', '1234123', '1234123', '1234123', '1234123', '2342bye']
>>> data[-1]
['1234123', '1234123', '1234125', '1251234', '1234123', '1234123', '1234123', '1234125', '1251234', '1234125', '1251234', '1234123', '1234123', 'hello35', 'hello32', '2492wow', '2342bye']

但是，如果您想检查还有多少文件要完成，您可能需要使用“迭代”映射（imap），而不是“异步”映射（amap）。详情请参阅此帖：Python multiprocessing - tracking the process of pool.map operation

获取pathos此处：https://github.com/uqfoundation

相关问题更多 >

编程相关推荐

热门问题

热门文章