无法在多台计算机上运行mpi4py程序

2024-05-29 11:12:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个非常简单的mpi python程序

from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.rank
name = MPI.Get_processor_name()

print('name: ', name, ' rank: ', rank)

MPI.Finalize

我在主机上安装了nfs内核服务器,在客户机上安装了nfs common。我按照本页的说明here

现在,我使用以下命令执行python mpi程序:

mpirun --hostfile myhostfile.txt -np 8 python hello.py

当我这样做时,我得到以下错误:

[mahmoud-desktop:05540] *** Process received signal ***
[mahmoud-desktop:05540] Signal: Segmentation fault (11)
[mahmoud-desktop:05540] Signal code:  (128)
[mahmoud-desktop:05540] Failing at address: (nil)
[mahmoud-desktop:05540] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f5a89bf0890]
[mahmoud-desktop:05540] [ 1] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x3d)[0x7f5a8988498d]
[mahmoud-desktop:05540] [ 2] /usr/lib/x86_64-linux-gnu/libopen-pal.so.20(opal_argv_free+0x29)[0x7f5a89e4b519]
[mahmoud-desktop:05540] [ 3] /usr/lib/x86_64-linux-gnu/libopen-rte.so.20(+0x283cb)[0x7f5a8a0d73cb]
[mahmoud-desktop:05540] [ 4] /usr/lib/x86_64-linux-gnu/libopen-rte.so.20(orte_util_add_hostfile_nodes+0xc1)[0x7f5a8a0d83f1]
[mahmoud-desktop:05540] [ 5] /usr/lib/x86_64-linux-gnu/libopen-rte.so.20(orte_ras_base_allocate+0xd3d)[0x7f5a8a1097fd]
[mahmoud-desktop:05540] [ 6] /usr/lib/x86_64-linux-gnu/libopen-pal.so.20(opal_libevent2022_event_base_loop+0xdc9)[0x7f5a89e63209]
[mahmoud-desktop:05540] [ 7] mpirun(+0x74a3)[0x55a0de7394a3]
[mahmoud-desktop:05540] [ 8] mpirun(+0x5aea)[0x55a0de737aea]
[mahmoud-desktop:05540] [ 9] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f5a8980eb97]
[mahmoud-desktop:05540] [10] mpirun(+0x59ea)[0x55a0de7379ea]
[mahmoud-desktop:05540] *** End of error message ***

分段故障(堆芯转储)

问题:如何修复此错误


Tags: namegnulinuxlibusrx86libcdesktop

热门问题