Parallelisation of a simple loop #
As usual, we’ll be running these exercises on Hamilton or COSMA, so remind yourself of how to log in and transfer code if you need to.
Obtaining the code #
The code for this exercise is in the code/add_numbers
subdirectory
of the repository.
We’ll be working in the openmp
subdirectory.
Working from the repository
main
branch again and create a new
branch for this exercise.
Parallelising the loop #
Exercise
Compile and run the code with OpenMP enabled.
Try running with different numbers of threads. Does the runtime change?
You should use a reasonably large value for
N
.
Check the add_numbers
routine in add_numbers.c
. Annotate it with
appropriate OpenMP pragmas to parallelise the loop.
Question
Does the code now have different runtimes when using different numbers of threads?
Solution
This code can be parallelised using a simple parallel for.
In
add_numbers.c
we annotate the for loop with#pragma omp parallel for default(none) shared(n_numbers, numbers) reduction(+:result) schedule(static)
If I do this, I see that the code now takes less time with fewer threads.
Different schedules #
Experiment with different loop schedules. Which work best? Which work worst?
Exercise
Produce a strong scaling plot for the computation as a function of the number of threads using the different schedules you investigated.
Don’t forget to do this on a compute node (submit a job script with
sbatch
) to avoid timing variability.What do you observe?
Solution
This is what I get for some different schedules when computing on a vector of one million numbers, I did not run multiple times to avoid timing variability.