MPI error - Communication failed between nodes #53

SurelyD · 2019-07-15T23:10:30Z

Hi,

I tried to run GPUSPH version 5 on the cluster using two nodes, two GPUs at each node, using the following line:
mpirun -np 4 -npernode 2 ./GPUSPH --device 0,1 # each device of Mahuika has two GPUs

However, the simulation was made only on one node and I got the following error messages:
[vgpuwbg001][[30539,1],0][connect/btl_openib_connect_udcm.c:1236:udcm_rc_qp_to_rtr] [vgpuwbg001][[30539,1],1][connect/btl_openib_connect_udcm.c:1236$ error modifing QP to RTR errno says Invalid argument [vgpuwbg002][[30539,1],3][connect/btl_openib_connect_udcm.c:1236:udcm_rc_qp_to_rtr] error modifing QP to RTR errno says Invalid argument [vgpuwbg002][[30539,1],2][connect/btl_openib_connect_udcm.c:1236:udcm_rc_qp_to_rtr] error modifing QP to RTR errno says Invalid argument
FATAL: cannot handle 1436584140 > 1073741823 cells
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 1.

Does anyone know how to fix it?

Thanks.

The text was updated successfully, but these errors were encountered:

Oblomov · 2019-07-16T07:25:01Z

Hello @SurelyD, it seems that you have at least two overlapping issues in your case.

There is something wrong with your MPI setup, but without additional details I wouldn't know about the specifics. A quick search on the internet shows that MPI errors similar to yours are frequent with some OpenMPI versions and Mellanox cards, and they can be solved by configuring MPI appropriately (see e.g. this issue). You may also want to test some other MPI implementation, such as MVAPICH. If possible, I would also recommend testing your MPI setup independently from GPUSPH (I believe the CUDA samples include some MPI examples these days).

Aside from the MPI issue, you also seem to have one related specifically to your test case. This message:

FATAL: cannot handle 1436584140 > 1073741823 cells

indicates that you are running an extremely large simulation, with a domain size that spans over a billion cells (cells are used for particle sorting, fast neighbors search, domain splitting in multi-device and to preserve uniform accuracy throughout the domain). Since GPUSPH stores the cell index as a 32-bit unsigned integer where the two highest bits are reserved for multi-GPU/multi-node usage, there is a limit of 2^30 (around 10^12) cells in the domain. There are a few tricks that can be used to achieve this, depending on your use-case (e.g. rotate the domain and the gravity if you have a long slope). If you can share more details we can look for a solution.

Narcolessico · 2019-07-16T07:54:10Z

Please also note that, although GPUSPH supports it, until some time ago multi-node multi-GPU was not supported by multiple MPI libraries (at least MVAPICH), which were unable to handle multiple device contexts. Things might have changed but it might be worth trying 2 processes on each node (4 GPUSPH processes on 2 nodes, each using a different device) to rule out one of the issues.

EDIT: now that I see better the command you used, I'm afraid you are running 2 processes on each node, each process attempting to use both GPUs. You should pass a single value to --device and probably add --num-hosts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPI error - Communication failed between nodes #53

MPI error - Communication failed between nodes #53

SurelyD commented Jul 15, 2019

Oblomov commented Jul 16, 2019

Narcolessico commented Jul 16, 2019 •

edited

Loading

MPI error - Communication failed between nodes #53

MPI error - Communication failed between nodes #53

Comments

SurelyD commented Jul 15, 2019

Oblomov commented Jul 16, 2019

Narcolessico commented Jul 16, 2019 • edited Loading

Narcolessico commented Jul 16, 2019 •

edited

Loading