python多处理结构。

2024-05-23 13:29:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在遍历一组大文件,并使用多处理进行操作/写入。我在数据帧中创建了一个iterable,并将其传递给multiprocessing的map函数。对于较小的文件,处理很好,但是当我找到较大的文件(~10g)时,我会得到错误:

python struct.error: 'i' format requires -2147483648 <= number <= 2147483647

代码:

^{pr2}$

基于this answer我认为问题是我传递给map的文件太大了。因此,我首先尝试将数据帧分成1.5g的块,然后将每个块独立地传递给map,但是仍然收到相同的错误。在

完全回溯:

Traceback (most recent call last):
  File "_FNMA_LLP_dataprep_final.py", line 51, in <module>
    write_files()
  File "_FNMA_LLP_dataprep_final.py", line 29, in write_files
    '.txt')
  File "/DATAPREP/appl/FNMA_LLP/code/FNMA_LLP_functions.py", line 116, in write_dynamic_columns_fannie
    pool1.map(write_in_parallel, first)
  File "/opt/Python364/lib/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/opt/Python364/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
  File "/opt/Python364/lib/python3.6/multiprocessing/pool.py", line 424, in _handle_tasks
    put(task)
  File "/opt/Python364/lib/python3.6/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/opt/Python364/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes
    header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647

Tags: 文件inpymapliblinemultiprocessingstruct
1条回答
网友
1楼 · 发布于 2024-05-23 13:29:12

您提到的answer中还有一个要点:数据应该由子函数加载。在您的例子中,它是函数在并行中写入。我建议您用下一种方法更改您的子函数:

def write_in_parallel('/path/to/your/data'):
    """ We'll make an assumption that your data is stored in csv file""" 

    data = pd.read_csv('/path/to/your/data')
    ...

那么您的“池代码”应该如下所示:

^{pr2}$

我希望这对你有帮助。在

相关问题 更多 >