Parallelisation of a simple loop #

As usual, we’ll be running these exercises on Hamilton or COSMA, so remind yourself of how to log in and transfer code if you need to.

Obtaining the code #

The code for this exercise is in the code/add_numbers subdirectory of the repository.

We’ll be working in the openmp subdirectory.

Working from the repository

If you have committed your changes on branches for the previous exercises, just checkout the main branch again and create a new branch for this exercise.

Parallelising the loop #

Exercise

Compile and run the code with OpenMP enabled.

Try running with different numbers of threads. Does the runtime change?

You should use a reasonably large value for N.

Check the add_numbers routine in add_numbers.c. Annotate it with appropriate OpenMP pragmas to parallelise the loop.

Question

Does the code now have different runtimes when using different numbers of threads?

Solution

This code can be parallelised using a simple parallel for.

In add_numbers.c we annotate the for loop with

#pragma omp parallel for default(none) shared(n_numbers, numbers) reduction(+:result) schedule(static)
If I do this, I see that the code now takes less time with fewer threads.

Different schedules #

Experiment with different loop schedules. Which work best? Which work worst?

Exercise

Produce a strong scaling plot for the computation as a function of the number of threads using the different schedules you investigated.

Don’t forget to do this on a compute node (submit a job script with sbatch) to avoid timing variability.

What do you observe?

Solution

This is what I get for some different schedules when computing on a vector of one million numbers, I did not run multiple times to avoid timing variability.

Strong scaling of OpenMP parallelisation of add_numbers with different schedule choices.