Lab 8: Parallelism II: OpenMP, Open MPI
Deadline: Monday, April 17, 11:59:59 PM PT
Setup
You must complete this lab on the hive machines (not your local machine). See Lab 0 if you need to set up the hive machines again.
In your labs
directory on the hive machine, pull any changes you may have made in past labs:
Still in your labs
directory on the hive machine, pull the files for this lab with:
If you run into any git
errors, please check out the common errors page.
OpenMP Directives
Usage for OpenMP directives can be found on the OpenMP summary card.
For
In the last lab, you manually assigned iterations within a for
loop to individual threads. OpenMP can take care of this automatically with the for
directive! You can either add a #pragma omp for
to an existing for
loop within a #pragma omp parallel
, or use #pragma omp parallel for
by itself.
Critical
The code in a critical section can only be executed by a single thread at any given time. Thus, having a critical section naturally prevents multiple threads from reading and writing to the same data, a problem that would otherwise lead to data races. OpenMP provides the critical
primitive to allow you to perform computations within a critical section.
Reduction
Critical sections may be slow, so OpenMP has a built in way to reduce this problem through the use of the reduction
keyword. The reduction keyword will increase the parallelizability of your code by automatically reducing the amount of code contained within the critical section. To use this keyword, you must specify the the variable that should be in a critical section and the operation being performed on that variable.
Exercise 1: OpenMP Dot Product
At first glance, implementing a dot product might seem not too different from v_add
from lab 7, since we should now just perform elementwise multiplication instead of addition. The challenge is how to add the products into one variable (aka reduction) to get our final answer. If multiple threads try to add their results to the same variable at the same time, it will lead to a data race which will result in an incorrect result.
Try running the tests now, only the naive solution should pass since the other optimizations are not implemented.
- Implement
dotp_critical
using OpenMP and a critical section.- Notice how the performance gets much worse as the number of threads go up? By putting all of the work of reduction in a critical section, we have flattened the parallelism and made it so only one thread can do useful work at a time (not exactly the idea behind thread-level parallelism).
- This contention is problematic; each thread is constantly fighting for the critical section and only one is making any progress at any given time. As the number of threads go up, so does the contention, and the performance pays the price.
- Implement
dotp_reduction
using OpenMP and a reduction statement.- Hint: the variable that should only be accessed by one thread at a time is
global_sum
and the operation being performed on it is addition (+
).
- Hint: the variable that should only be accessed by one thread at a time is
- Implement
dotp_manual_reduction
using OpenMP and the idea of reduction, but do not use thereduction
keyword.- Hint: create variables for each thread and only add to the final sum when absolutely necessary.
Open MPI
The Open MPI project provides a way of writing programs which can be run on multiple processes. We can use its C libraries by calling their functions. Then, when we run the program, Open MPI will create a bunch of processes and run a copy of the code on each process. Here is a list of the most important functions for this class:
int MPI_Init(int* argc, char*** argv)
should be called at the start of the program, passing in the addresses of argc and argv.int MPI_Finalize()
should be called at the end of the program.int MPI_Comm_size(MPI_COMM_WORLD, int *size)
gets the total number of processes running the program, and puts it in size.int MPI_Comm_rank(MPI_COMM_WORLD, int *rank)
gets the ID of the current process (0 ∼ total number of processes - 1) and puts it in rank.int MPI_Send(const void *buf, int count, MPI_Datatype datatype, int dest, 0, MPI_COMM_WORLD)
(Man page)buf
is where the message to send should be storedcount
is the number of elements within the message to senddatatype
is the datatype of each element within the messagedest
is the process ID of the recipient of the messagetag
is the tag of the message- For the scope of this lab, feel free to use
0
.
- For the scope of this lab, feel free to use
comm
is the communication context of the message- For the scope of this class, use
MPI_COMM_WORLD
- For the scope of this class, use
int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status)
(Man page)buf
is used to store the received message(s)count
is the maximum number of elements to receivedatatype
is the datatype of each element within the received messagesource
is the source where the message should be from- If you want to receive a message from any source, set the source to be
MPI_ANY_SOURCE
. - The source of the message can be found in the
MPI_SOURCE
field of the outputted status struct.
- If you want to receive a message from any source, set the source to be
tag
is the tag that the message must be sent with- If you want to receive a message with any tag, set the tag to be
MPI_ANY_TAG
.
- If you want to receive a message with any tag, set the tag to be
comm
is the communication context that the message must be sent within- For the scope of this class, use
MPI_COMM_WORLD
- For the scope of this class, use
status
will store a status struct with additional information- If you don’t need the information in the status struct (e.g. because you already know the source of the message), set the status address to
MPI_STATUS_IGNORE
.
- If you don’t need the information in the status struct (e.g. because you already know the source of the message), set the status address to
For Open MPI, make sure you use mpicc
(instead of gcc
) to compile and mpirun
to run your programs.
Exercise 2: Hello Open MPI
- In
ex2.c
, use the above functions to write code to run a program that printsHello world
on multiple processes. For this exercise, since there's only one type of messages, feel free to use0
as thetag
andMPI_COMM_WORLD
ascomm
. - Compile the program with
mpicc ex2.c -o ex2
- Run the program with
mpirun ./ex2 [num tasks]
, where[num tasks]
is the number of tasks to run.- You can set the number of processes to use with
mpirun -np [num processes] ./ex2 [num tasks]
- If you choose to run with more processes than physical cores (4 for the hive machines), you must pass the
--oversubscribe
flag (e.g.mpirun --oversubscribe -np 100 ./ex2 100
)
- You can set the number of processes to use with
- Play around with the different number of threads and tasks and see how the output changes!
Feel free to refer to discussion 10 for an example on writing Open MPI code! The code structure will be extremely similar.
Submission
Save, commit, and push your work, then submit to the Lab 8 assignment on Gradescope.