MPI

The Message Passing Interface is a communication protocol for programming parallel computers. Both point-to-point and collective communication are supported. Tolosa heavily relies on MPI to build its parallel environment. Several structures were created to make the communications’ implementation easier.

Collective communications

Fortran structure

The mpi_pool structure defines a pool of MPI processes and basic operations on those processes. This structure enables collective communications inside a communicator.

TYPE mpi_pool

  integer(ip)              :: comm                ! Communicators Pool (mpi_comm_world for whole)
  integer(ip)              :: size                ! Number of ranks in Pool
  integer(ip)              :: rank                ! Rank Number from 0 to size-1
  integer(ip)              :: err                 ! Usefull to test an error at output of an MPI routine
  integer(ip), allocatable :: rankorder(:)
  logical                  :: is_init = .false.

CONTAINS

  procedure, pass( self ) :: init      => mpi_init_
  procedure, pass( self ) :: final     => mpi_final_
  procedure, pass( self ) :: waitall   => mpi_wait_all
  procedure, pass( self ) :: barrier   => mpi_wait_all
  procedure, pass( self ) :: sum       => mpi_sum_
  procedure, pass( self ) :: prod      => mpi_prod_
  procedure, pass( self ) :: min       => mpi_min_
  procedure, pass( self ) :: max       => mpi_max_
  procedure, pass( self ) :: bcast     => mpi_bcast_

  procedure, pass( self ) :: mpi_allgather_i , mpi_allgather_r

  generic :: allgather => mpi_allgather_i , mpi_allgather_r

END TYPE mpi_pool

To initialize a pool of MPI processes, one has to call the mpi_init_ subroutine.

All the collective communications and collective operations are facilitated. The MPI operations enabling data communications (MPI_BCAST(), MPI_SCATTER(), MPI_GATHER(), MPI_ALLGATHER(), MPI_ALLTOALL()), data operations and communications (MPI_REDUCE(), MPI_ALLREDUCE()), and global synchronization (MPI_BARRIER()) are easily implemented in the following subroutines :

MODULE SUBROUTINE mpi_wait_all( self )

  class(mpi_pool), intent(inout) :: self

END SUBROUTINE mpi_wait_all

MODULE SUBROUTINE mpi_sum_( self , val )

  class(mpi_pool), intent(inout) :: self
  class(*)       , intent(inout) :: val

END SUBROUTINE mpi_sum_

MODULE SUBROUTINE mpi_prod_( self , val )

  class(mpi_pool), intent(inout) :: self
  class(*)       , intent(inout) :: val

END SUBROUTINE mpi_prod_

MODULE SUBROUTINE mpi_max_( self , val , which )

  class(mpi_pool), intent(inout)           :: self
  class(*)       , intent(inout)           :: val
  integer(ip)    , intent(  out), optional :: which

END SUBROUTINE mpi_max_

MODULE SUBROUTINE mpi_min_( self , val , which )

  class(mpi_pool), intent(inout)           :: self
  class(*)       , intent(inout)           :: val
  integer(ip)    , intent(  out), optional :: which

END SUBROUTINE mpi_min_

MODULE SUBROUTINE mpi_bcast_( self , val , proc )

  class(mpi_pool), intent(inout) :: self
  class(*)       , intent(inout) :: val
  integer(ip)    , intent(in   ) :: proc

END SUBROUTINE mpi_bcast_

MODULE SUBROUTINE mpi_allgather_i( self , val , allgather )

  class(mpi_pool), intent(inout) :: self
  integer(ip)    , intent(in   ) :: val
  integer(ip)    , intent(inout) :: allgather( np )

END SUBROUTINE mpi_allgather_i

MODULE SUBROUTINE mpi_allgather_r( self , val , allgather )

  class(mpi_pool), intent(inout) :: self
  real(rp)       , intent(in   ) :: val
  real(rp)       , intent(inout) :: allgather( np )

END SUBROUTINE mpi_allgather_r

When the pool of MPI processes is no longer needed, one has to finalize MPI by calling the mpi_final_ subroutine.

Usage

To be able to use this structure, one has to initialize a the MPI environment with a new mpi_pool object first.

type(mpi_pool) :: mpi_world

[...]

call mpi_world%init

When the MPI pool is initialized, one can call specific operations. For example, one can gather all the processes’ rank in one table by writting :

integer              :: rank
integer, allocatable :: gath_rank(:)

type(mpi_pool) :: mpi_world

[...]

rank = mpi_world%rank

allocate( gath_rank( mpi_world%size ) )

call mpi_world%allgather( rank , gath_rank )

[...]

When the collective communications and operations are no longer needed, one can finalize MPI :

type(mpi_pool) :: mpi_world

[...]

call mpi_world%final

Point-to-point communications

Fortran structure

The mpi_mess structure represents a process with its rank, and other parameters used for the point-to-point communication. The mpi_com structure defines a point-to-point communication between two processes s and r. The mpi_com%pool points to a pool of MPI processes, previously defined as mpi_world ; this will enable one to identify the current process in the pool.

TYPE mpi_mess

  integer(ip)              :: rank   = 0_ip
  integer(ip)              :: count  = 1_ip
  integer(ip)              :: istart = 1_ip
  integer(ip)              :: tag    = 0_ip
  integer(ip)              :: typ_i
  integer(ip)              :: typ_r
  integer                  :: req
  integer(ip)              :: err
  integer(ip), allocatable :: d(:)
  integer(ip), allocatable :: l(:)
  integer(ip)              :: n
  integer(ip)              :: typ

END TYPE mpi_mess

TYPE mpi_com

  type(mpi_pool), pointer :: pool => mpi_world
  type(mpi_mess)          :: s
  type(mpi_mess)          :: r
  integer(ip)             :: err
  logical                 :: twoside = .false.
  logical                 :: is_init = .false.

  CONTAINS

  procedure :: equal_mpi_com

  procedure, pass( self ) :: reset             => reset_mpi_com
  procedure, pass( self ) :: init              => initialize_mpi_com
  procedure, pass( self ) :: replace           => replace_mpi_com
  procedure, pass( self ) :: wait_send         => mpi_com_wait_send
  procedure, pass( self ) :: wait_recv         => mpi_com_wait_recv
  procedure, pass( self ) :: wait              => mpi_com_wait
  procedure, pass( self ) :: regen_typ_to_send => mpi_com_regen_typ_to_send
  procedure, pass( self ) :: regen_typ_to_recv => mpi_com_regen_typ_to_recv
  procedure, pass( self ) :: regen_typ         => mpi_com_regen_typ

  procedure, pass( self ) :: mpi_com_send_scalar_i              , &
                             mpi_com_send_scalar_r              , &
                             mpi_com_recv_scalar_i              , &
                             mpi_com_recv_scalar_r              , &
                             mpi_com_send_array_i               , &
                             mpi_com_recv_array_i               , &
                             mpi_com_send_array_r               , &
                             mpi_com_recv_array_r               , &
                             mpi_com_send_recv_scalar_i         , &
                             mpi_com_send_recv_scalar_r         , &
                             mpi_com_send_recv_array_i          , &
                             mpi_com_send_recv_array_r          , &
                             mpi_com_gen_typ_to_send_indexed    , &
                             mpi_com_gen_typ_to_recv_indexed    , &
                             mpi_com_gen_typ_to_send_contiguous , &
                             mpi_com_gen_typ_to_recv_contiguous

  generic :: assignment(=)   => equal_mpi_com
  generic :: send            => mpi_com_send_scalar_i , &
                                mpi_com_send_scalar_r , &
                                mpi_com_send_array_i  , &
                                mpi_com_send_array_r
  generic :: recv            => mpi_com_recv_scalar_i , &
                                mpi_com_recv_scalar_r , &
                                mpi_com_recv_array_i  , &
                                mpi_com_recv_array_r
  generic :: send_recv       => mpi_com_send_recv_scalar_i , &
                                mpi_com_send_recv_scalar_r , &
                                mpi_com_send_recv_array_i  , &
                                mpi_com_send_recv_array_r
  generic :: gen_typ_to_send => mpi_com_gen_typ_to_send_indexed , &
                                mpi_com_gen_typ_to_send_contiguous
  generic :: gen_typ_to_recv => mpi_com_gen_typ_to_recv_indexed , &
                                mpi_com_gen_typ_to_recv_contiguous

END TYPE mpi_com

When initializing the point-to-point communicator by calling the initialize_mpi_com subroutine, one has to define the ranks of the processes sending and receiving the data. If the data to be communicated is an array, and one wishes to send and receive only a portion of the array, one can enter the send_istart, recv_istart, send_count, recv_count values. If both processes need to send and receive data, one has to define twoside = .true. . A pool of MPI processes will also be defined ; if the pool is not specified at the initialization, a pool of all MPI processes will be defined.

MODULE SUBROUTINE initialize_mpi_com( self , send_rank   , recv_rank   , &
                                             send_istart , recv_istart , &
                                             send_count  , recv_count  , twoside , pool )

  class(mpi_com), intent(inout)                   :: self
  integer(ip)   , intent(in   )                   :: send_rank
  integer(ip)   , intent(in   )                   :: recv_rank
  integer(ip)   , intent(in   ), optional         :: send_istart
  integer(ip)   , intent(in   ), optional         :: recv_istart
  integer(ip)   , intent(in   ), optional         :: send_count
  integer(ip)   , intent(in   ), optional         :: recv_count
  logical       , intent(in   ), optional         :: twoside
  type(mpi_pool), intent(in   ), optional, target :: pool

END SUBROUTINE initialize_mpi_com

MODULE SUBROUTINE reset_mpi_com( self )

  class(mpi_com), intent(inout) :: self

END SUBROUTINE reset_mpi_com

MODULE SUBROUTINE equal_mpi_com( this , from )

  class(mpi_com), intent(inout) :: this
  type(mpi_com) , intent(in   ) :: from

END SUBROUTINE equal_mpi_com

MODULE SUBROUTINE replace_mpi_com( self , send_rank   , recv_rank   , &
                                    send_istart , recv_istart , &
                                    send_count  , recv_count  , twoside , pool )

  class(mpi_com), intent(inout)                   :: self
  integer(ip)   , intent(in   ), optional         :: send_rank
  integer(ip)   , intent(in   ), optional         :: recv_rank
  integer(ip)   , intent(in   ), optional         :: send_istart
  integer(ip)   , intent(in   ), optional         :: recv_istart
  integer(ip)   , intent(in   ), optional         :: send_count
  integer(ip)   , intent(in   ), optional         :: recv_count
  logical       , intent(in   ), optional         :: twoside
  type(mpi_pool), intent(in   ), optional, target :: pool

END SUBROUTINE replace_mpi_com

When the communication is initialized, all point-to-point communications are facilitated. One can run blocking and non blocking communications. Non blocking communications are send and recv. Blocking communications are enabled by calling send_recv. One can also wait for the end of a non blocking communication by calling wait, wait_send, wait_recv.

MODULE SUBROUTINE mpi_com_wait_send( self )

  class(mpi_com), intent(inout) :: self

END SUBROUTINE mpi_com_wait_send

MODULE SUBROUTINE mpi_com_wait_recv( self )

  class(mpi_com), intent(inout) :: self

END SUBROUTINE mpi_com_wait_recv

MODULE SUBROUTINE mpi_com_wait( self )

  class(mpi_com), intent(inout) :: self

END SUBROUTINE mpi_com_wait

The following subroutines define the exchange of an integer. The subroutines defining the exchange of a real are equivalent.

MODULE SUBROUTINE mpi_com_send_scalar_i( self , to_send , send_rank , withinfo )

  class(mpi_com), intent(inout)           :: self
  integer(ip)   , intent(in   )           :: to_send
  integer(ip)   , intent(in   ), optional :: send_rank
  logical       , intent(in   ), optional :: withinfo

END SUBROUTINE mpi_com_send_scalar_i

MODULE SUBROUTINE mpi_com_recv_scalar_i( self , to_recv , recv_rank , withinfo )

  class(mpi_com), intent(inout)           :: self
  integer(ip)   , intent(inout)           :: to_recv
  integer(ip)   , intent(in   ), optional :: recv_rank
  logical       , intent(in   ), optional :: withinfo

END SUBROUTINE mpi_com_recv_scalar_i

MODULE SUBROUTINE mpi_com_send_array_i( self , to_send , send_rank , withinfo )

  class(mpi_com), intent(inout)           :: self
  integer(ip)   , intent(in   )           :: to_send(:)
  integer(ip)   , intent(in   ), optional :: send_rank
  logical       , intent(in   ), optional :: withinfo

END SUBROUTINE mpi_com_send_array_i

MODULE SUBROUTINE mpi_com_recv_array_i( self , to_recv , recv_rank , withinfo )

  class(mpi_com), intent(inout)           :: self
  integer(ip)   , intent(inout)           :: to_recv(:)
  integer(ip)   , intent(in   ), optional :: recv_rank
  logical       , intent(in   ), optional :: withinfo

END SUBROUTINE mpi_com_recv_array_i

The following subroutines define the exchange of an integer. The subroutines defining the exchange of a real are equivalent.

MODULE SUBROUTINE mpi_com_send_recv_scalar_i( self , to_send , send_rank   , &
                                                     to_recv , recv_rank , withinfo )

  class(mpi_com), intent(inout)           :: self
  integer(ip)   , intent(inout)           :: to_send
  integer(ip)   , intent(inout), optional :: to_recv
  integer(ip)   , intent(in   ), optional :: send_rank
  integer(ip)   , intent(in   ), optional :: recv_rank
  logical       , intent(in   ), optional :: withinfo

END SUBROUTINE mpi_com_send_recv_scalar_i

MODULE SUBROUTINE mpi_com_send_recv_array_i( self , to_send , send_rank   , &
                                                    to_recv , recv_rank , withinfo , withtmp )

  class(mpi_com), intent(inout)           :: self
  integer(ip)   , intent(inout)           :: to_send(:)
  integer(ip)   , intent(inout), optional :: to_recv(:)
  integer(ip)   , intent(in   ), optional :: send_rank
  integer(ip)   , intent(in   ), optional :: recv_rank
  logical       , intent(in   ), optional :: withinfo
  logical       , intent(in   ), optional :: withtmp

END SUBROUTINE mpi_com_send_recv_array_i

Usage

To initialize a point-to-point communication between two processes ranked 0 and 1, one can write :

type(mpi_com) :: mpi

call mpi%init( send_rank = 0 , recv_rank = 1 )

One can choose to have a non-blocking communication to send an array by simply writting :

type(mpi_com)     :: mpi
real, allocatable :: to_send(:)
real, allocatable :: to_recv(:)

[...]

call mpi%send( to_send(:) )
call mpi%recv( to_recv(:) )

[...]

One can also have a blocking communication. Here, the process 0 is sending a scalar to the process 1 :

type(mpi_com) :: mpi
integer       :: a

[...]

call mpi%send_recv( to_send = a )

If both processes are sending and receiving data to and from the other process, one should specify that the communication is on both sides, and send and receive data by writting :

type(mpi_com) :: mpi
integer       :: a_send, a_recv

[...]

call mpi%replace( twoside = .true. )

call mpi%send_recv( to_send = a_send , &
                    to_recv = a_recv )

Graph communications

Fortran structure

The mpi_grph structure defines point-to-point communications in a graph of MPI processes.

MPI graphs

This image represents two graphs of MPI processes. The first graph contains the processes ranked 0, 1, 2, and 3. The second graph contains the processes ranked 0, 3, 4, 5. The mpi_grph structure is describing an MPI graph :

TYPE mpi_grph

  type(mpi_pool), pointer :: pool => mpi_world

  integer(ip) :: ne      ! Number of Edges (Two Sided Connected Ranks)
  integer(ip) :: ne_send ! Number of Edges (One Sided Connected Ranks)
  integer(ip) :: ne_recv ! Number of Edges (One Sided Connected Ranks)

  type(mpi_com), allocatable :: edge(:) ! Edges Array

  integer(ip) :: ireq

  logical :: is_init = .false.

CONTAINS

  procedure :: equal_mpi_graph

  procedure, pass( self ) :: init      => initialize_mpi_graph
  procedure, pass( self ) :: wait_send => mpi_graph_wait_send
  procedure, pass( self ) :: wait_recv => mpi_graph_wait_recv
  procedure, pass( self ) :: wait      => mpi_graph_wait
  procedure, pass( self ) :: regen_typ => mpi_graph_regen_typ

  procedure, pass( self ) :: mpi_graph_send_scal_i       , &
                             mpi_graph_send_scal_r       , &
                             mpi_graph_recv_scal_i       , &
                             mpi_graph_recv_scal_r       , &
                             mpi_graph_send_array_i      , &
                             mpi_graph_send_array_r      , &
                             mpi_graph_recv_array_i      , &
                             mpi_graph_recv_array_r      , &
                             mpi_graph_send_recv_scal_i  , &
                             mpi_graph_send_recv_scal_r  , &
                             mpi_graph_send_recv_array_i , &
                             mpi_graph_send_recv_array_r

  generic :: assignment(=) => equal_mpi_graph
  generic :: send          => mpi_graph_send_scal_i  , &
                              mpi_graph_send_scal_r  , &
                              mpi_graph_send_array_i , &
                              mpi_graph_send_array_r
  generic :: recv          => mpi_graph_recv_scal_i  , &
                              mpi_graph_recv_scal_r  , &
                              mpi_graph_recv_array_i , &
                              mpi_graph_recv_array_r
  generic :: send_recv     => mpi_graph_send_recv_scal_i  , &
                              mpi_graph_send_recv_scal_r  , &
                              mpi_graph_send_recv_array_i , &
                              mpi_graph_send_recv_array_r

END TYPE mpi_grph

Graph example In this graph :

the number of two sided connected processes is ne = 3
the number of one sided connected processes sending information is ne_send = 2
the number of one sided connected processes receiving information is ne_recv = 2

Process rank	Receives from	Sends to
0	2 3	1 3
1	0 2	2
2	1 3	1 0 3
3	2 0	2 0

To use an MPI graph to enable communications between the processes, one should initialize the graph by calling the initialize_mpi_graph subroutine. One has to specify the number of two-sided connecting edges ne, and possibly a pool of processes to be targetted.

MODULE SUBROUTINE initialize_mpi_graph( self , ne , pool )

  class(mpi_grph), intent(inout)                   :: self
  integer(ip)    , intent(in   )                   :: ne
  type(mpi_pool) , intent(in   ), optional, target :: pool

END SUBROUTINE initialize_mpi_graph

MPI graphs only enable non-blocking communications. One can communicate by using the send, recv, send_recv routines, and wait for the end of sending, receiving or both communications by calling respectively the wait_send, wait_recv and wait routines.

MODULE SUBROUTINE mpi_graph_wait_send( self )

  class(mpi_grph), intent(in) :: self

END SUBROUTINE mpi_graph_wait_send

MODULE SUBROUTINE mpi_graph_wait_recv( self )

  class(mpi_grph), intent(in) :: self

END SUBROUTINE mpi_graph_wait_recv

MODULE SUBROUTINE mpi_graph_wait( self )

  class(mpi_grph), intent(in) :: self

END SUBROUTINE mpi_graph_wait

The following subroutines define the communications of an integer, or an array of integers. The subroutines defining the communications of a real (or array of reals) are equivalent.

MODULE SUBROUTINE mpi_graph_send_scal_i( self , to_send , withinfo )

  class(mpi_grph), intent(inout)           :: self
  integer(ip)    , intent(in   )           :: to_send
  logical        , intent(in   ), optional :: withinfo

END SUBROUTINE mpi_graph_send_scal_i

MODULE SUBROUTINE mpi_graph_recv_scal_i( self , to_recv , withinfo )

  class(mpi_grph), intent(inout)           :: self
  integer(ip)    , intent(inout)           :: to_recv
  logical        , intent(in   ), optional :: withinfo

END SUBROUTINE mpi_graph_recv_scal_i

MODULE SUBROUTINE mpi_graph_send_array_i( self , to_send , withinfo )

  class(mpi_grph), intent(inout)           :: self
  integer(ip)    , intent(in   )           :: to_send(:)
  logical        , intent(in   ), optional :: withinfo

END SUBROUTINE mpi_graph_send_array_i

MODULE SUBROUTINE mpi_graph_recv_array_i( self , to_recv , withinfo )

  class(mpi_grph), intent(inout)           :: self
  integer(ip)    , intent(inout)           :: to_recv(:)
  logical        , intent(in   ), optional :: withinfo

END SUBROUTINE mpi_graph_recv_array_i

The following subroutines define the communications of an integer, or an array of integers. The subroutines defining the communications of a real (or array of reals) are equivalent.

MODULE SUBROUTINE mpi_graph_send_recv_scal_i( self , to_send , to_recv , withinfo )

  class(mpi_grph), intent(inout)           :: self
  integer(ip)    , intent(inout)           :: to_send
  integer(ip)    , intent(inout), optional :: to_recv
  logical        , intent(in   ), optional :: withinfo

END SUBROUTINE mpi_graph_send_recv_scal_i

MODULE SUBROUTINE mpi_graph_send_recv_array_i( self , to_send , to_recv , withinfo , withtmp )

  class(mpi_grph), intent(inout)           :: self
  integer(ip)    , intent(inout)           :: to_send(:)
  integer(ip)    , intent(inout), optional :: to_recv(:)
  logical        , intent(in   ), optional :: withinfo
  logical        , intent(in   ), optional :: withtmp

END SUBROUTINE mpi_graph_send_recv_array_i

Usage

Graph example To initialize this MPI graph, one can write :

type(mpi_pool) :: mpi_world
type(mpi_grph) :: mpi_graph

[...]

call mpi_graph%init( ne = 5 , pool = mpi_world )

Then, one can define the communicating processes contained in this graph. The edge variable is an array of point-to-point communicators (mpi_com).

call mpi_graph%edge( 0 )%init( send_rank = 0 , recv_rank = 1 , twoside = T )
call mpi_graph%edge( 1 )%init( send_rank = 1 , recv_rank = 2 , twoside = T )
call mpi_graph%edge( 2 )%init( send_rank = 0 , recv_rank = 2 , twoside = T )
call mpi_graph%edge( 3 )%init( send_rank = 2 , recv_rank = 3 , twoside = T )
call mpi_graph%edge( 4 )%init( send_rank = 3 , recv_rank = 4 , twoside = T )

The following example shows the communication of global indexes of the mesh cells. Since the mesh is partitioned between each process, the mesh cells’ indexes are obviously different between processes.

type(msh) :: mesh

[...]

call mpi_graph%send_recv( mesh%cell(:)%glob  , withtmp = T )

[...]

This mpi_grph structure is fully used to handle communications between mesh partitions. See Mesh to further understand the use of this structure.