Marc2_UserGuide/2_UsingParaStationMPI

ParaStation MPI

To use ParaStation MPI, the appropriate module must be loaded:

  • parastation/mpi2-gcc-5.0.27-1: uses GCC compiler and libraries,
  • parastation/mpi2-intel-5.0.27-1: uses Intel compiler and libraries
  • parastation/mpi2-pgi-5.0.27-1: uses PGI compiler and libraries

See module environment for details. To compile applications, use mpicc, mpif77 or mpif90 available by using $PATH. The required GCC compiler may be specified using the appropriate module.

To run an application use mpiexec also available by using $PATH. Or use the absolute path /opt/parastation/bin/mpiexec. Information on nodes to be used is typically provided by the batch queuing system and automatically selected by the ParaStation runtime environment.

See batch queuing system for an example how to use mpiexec. See mpiexec man page and ParaStation User's Guide for more information.

Environment Variables for use with QLogic IB

To enable the full performance of the installed QLogic HCAs, ParaStation uses the PSM layer provided by the QLogic OFED stack. In order to do so, the shared library libpsm_infinipath (and libinfinipath) are dynamically linked to the ParaStation MPI communication layer at run time.

Some ParaStation environment variables controlling this behavior, see ps_environment for more details:

  • PSP_PSM: this variable defines the priority of using the PSM library. Setting it to 2 or higher elevates it over the default priority of 1 and therefore favors this communication channel against all others. Setting it to 1 or unsetting it at all, gives the default behavior: use PSM for inter-node communication and shared memory for intra-node communication. Setting it to 0 prevents using PSM.

By default, PSP_PSM is set to 2 and exported to all processes to prevent some multi-node timing issues appearing with multi-node jobs using PSM and shared memory. Jobs running within a single node may set the variable to 0 to re-enable shared memory communication.

Some variables for usage with PSM:

  • PSM_SHAREDCONTEXTS_MAX: see documentation. From Andrew Russel (Intel):

"PSM_SHAREDCONTEXTS_MAX can be helpful. It defines how many hardware contexts the job will consume. You have 16 hardware contexts. The free contexts can be shared 1, 2, 3 or 4 ways. For PSM_SHAREDCONTEXTS_MAX to be useful, you have to set it on the PREVIOUS job. Eg: if I start 16 processes on a 64-core node, that job will use all 16 contexts, so you can't start any more jobs on that node. If I start 16 processes on a 64-core node with PSM_SHAREDCONTEXTS_MAX=8, that job will use 8 contexts, leaving 8 free. You could start another 32 processes (8 contexts shared 4 ways) on that node."

  • PSM_RANKS_PER_CONTEXT: (undocumented) From Andrew Russel (Intel):

"If you always set "$PSM_RANKS_PER_CONTEXT=4", then it will always do 4-way sharing, and so you should not grab all the contexts when partially filling a node. (This removes the need to do the logic I went through to calculate the correct value of PSM_SHAREDCONTEXTS_MAX, further down in this thread). Note that this only works well when your ppn is divisible by 4."

  • MPI_LOCALRANKID: (undocumented) required for PSM_SHAREDCONTEXTS_MAX or PSM_RANKS_PER_CONTEXT to work. Automatically set by mpiexec.
  • MPI_LOCALNRANKS: (undocumented) required for PSM_SHAREDCONTEXTS_MAX or PSM_RANKS_PER_CONTEXT to work. Automatically set by mpiexec.
  • PSM_SHAREDCONTEXTS: see documentation. Enable/disable context sharing. Enabled by default.
  • PSM_DEVICES: see documentation. Communication paths used by PSM.

Some of these PSM-specific variables are described in  UserGuide_IB_OFED_HostSoftware153_IB0054606-01B.pdf

Note: variables other than PSP_* or PSI_* have to be exported to the processes either using mpiexec —env … or adding there names to PSI_EXPORTS, eg.

# export PSM_RANKS_PER_CONTEXT=4
# export PSI_EXPORTS="$PSI_EXPORTS,PSM_RANKS_PER_CONTEXT"
# mpiexec ...

See ps_environment and mpiexec for more details.