MPI: a messaging API #
OpenMP level parallelism is possible when using a single computer, with a shared memory space.
If we want to solve really big problems, we need to move beyond this to a situation where we have distributed memory. In this situation, we have multiple computers which are connected with some form of network. The dominant parallel programming paradigm for this situation is that of message passing.
Where in the OpenMP/shared memory world we can pass information between threads by writing to some shared memory, in distributed memory settings we need to explicitly package up and send data between the different computers involved in the computation.
In the high performance computing world, the dominant library for message passing is MPI, the Message Passing Interface.
Let’s look at a simple “Hello, World” MPI program, and contrast it with the equivalent OpenMP program.
#include <stdio.h>
#include <mpi.h>
int main(void)
{
int rank, size, len;
MPI_Comm comm;
char name[MPI_MAX_PROCESSOR_NAME];
MPI_Init(NULL, NULL);
comm = MPI_COMM_WORLD;
MPI_Comm_rank(comm, &rank);
MPI_Comm_size(comm, &size);
MPI_Get_processor_name(name, &len);
printf("Hello, World! I am rank %d of %d. Running on node %s\n",
rank, size, name);
MPI_Finalize();
return 0;
}
#include <stdio.h>
#include <omp.h>
int main(void)
{
int nthread = omp_get_max_threads();
int thread;
#pragma omp parallel private(thread) shared(nthread)
{
thread = omp_get_thread_num();
printf("Hello, World! I am thread %d of %d\n", thread, nthread);
}
return 0;
}
Our MPI programs always start by initialising MPI with MPI_Init
.
They must finish by shutting down, with MPI_Finalize
. Inside, it
seems like there is no explicit parallelism written anywhere.
Concepts #
The message passing model is based on the notion of processes (rather than the threads in OpenMP). Processes are very similar, but do not share an address space (and therefore do not share memory).
Like in OpenMP programming, we achieve parallelism by having many processes cooperate to solve the same task. We must come up with some way of dividing the data and computation between the processes.
Since processes do not share memory, they must instead send messages to communicate information. This is implemented in MPI through library calls that we can make from our sequential programming language. This is in contrast to OpenMP which defines pragma-based extensions to the language.
The core difficulty in writing message-passing programs is the conceptual model. This is a very different model to that required for sequential programs. Becoming comfortable with this model is key to understanding MPI programs. A key idea, which is different from the examples we’ve seen with OpenMP, is that there is much more focus on the decomposition of the data and work. That is, we must think about how we divide the data (and hence work) in our parallel program. I will endeavour to emphasise this through the examples and exposition when we encounter MPI functions.
Although at first MPI parallelism seems more complicated than OpenMP
(we can’t just annotate a few loops with a #pragma
), it is, in my
experience, a much more powerful programming model, and better
suited to the implementation of reusable software.
Single program, multiple data (SPMD) #
Most MPI programs are written using the single-program, multiple data (SPMD) paradigm. All processes are launched and run their own copy of the same program. You saw this with the Hello World example.
Although each process is running the same program, they each have a separate copy of data (there is no sharing like in OpenMP).
So that this is useful, processes have a unique identifier, their rank. We can then write code that sends different ranks down different paths in the control flow.
The way to think about this is as if we had written a number of different copies of a program and each process gets its own copy. They then execute at the same time and can pass messages to each other.
Suppose we have a function
void print_hello(MPI_Comm comm)
{
int rank;
int size;
MPI_Comm_rank(comm, &rank);
MPI_Comm_size(comm, &size);
printf("Hello, I am rank %d of %d\n", rank, size);
}
Then if we execute it with two processes we have
Process 0
void print_hello(MPI_Comm comm)
{
int rank;
int size;
rank = 0;
size = 2;
printf("Hello, I am rank %d of %d\n", rank, size);
}
Process 1
void print_hello(MPI_Comm comm)
{
int rank;
int size;
rank = 1;
size = 2;
printf("Hello, I am rank %d of %d\n", rank, size);
}
Of course, on its own, this is not that useful. So the real power in MPI comes through the ability to send messages between the processes. These are facilitated by communicators.
Communicators #
The powerful abstraction that MPI is built around is a notion of a communicator. This logically groups some set of processes in the MPI program. All communication happens via communicators. That is, when sending and receiving messages we do so by providing a communicator and a source/destination to be interpreted with reference to that communicator.
When MPI launches a program, it pre-initialises two communicators
MPI_COMM_WORLD
- A communicator consisting of all the processes in the current run.
MPI_COMM_SELF
- A communicator consisting of each process individually.
The figure below shows an example of eight processes and draws the world and self communicators.
A key thing to note is that the processes are the same in the left and right figures. It is just their identifier that changes depending on which communicator we view them through.
Exercise
This concept is illustrated by
mpi-snippets/comm-world-self.c
.Compile it with
$ mpicc -o comm-world-self comm-world-self.c
Hint
Don’t forget than in addition to loading the normal compiler modules you also need to load theintelmpi/intel/2018.2
module.And run with
$ mpirun -n 4 ./comm-world-self
Do you understand the output?
An important thing about communicators is that they are always explicit when we send messages: to send a message, we need a communicator. So communicators, and the group of processes they represent, are at the core of MPI programming. This is in contrast to OpenMP where we generally don’t think about which threads are in involved in a parallel region.
We’ll revisit these ideas as we learn more MPI functions. Next, we’ll look at point-to-point messaging.