**MOESI Cache Coherence**

<table>
<thead>
<tr>
<th>State</th>
<th>Cache up to date?</th>
<th>Memory up to date?</th>
<th>Others have copy?</th>
<th>Can respond to other's reads?</th>
<th>Can write without changing state?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Modified</td>
<td>YES</td>
<td>NO</td>
<td>NO</td>
<td>YES, REQUIRED</td>
<td>YES</td>
</tr>
<tr>
<td>Owned</td>
<td>YES</td>
<td>MAYBE</td>
<td>MAYBE</td>
<td>YES, REQUIRED</td>
<td>YES</td>
</tr>
<tr>
<td>Exclusive</td>
<td>YES</td>
<td>YES</td>
<td>NO</td>
<td>YES, OPTIONAL</td>
<td>NO</td>
</tr>
<tr>
<td>Shared</td>
<td>YES</td>
<td>MAYBE</td>
<td>MAYBE</td>
<td>NO</td>
<td>NO</td>
</tr>
<tr>
<td>Invalid</td>
<td>NO</td>
<td>MAYBE</td>
<td>MAYBE</td>
<td>NO</td>
<td>NO</td>
</tr>
</tbody>
</table>

With the MOESI concurrency protocol implemented, accesses to cache accesses appear *serializable*. This means that the result of the parallel cache accesses appear the same as if there were done in serial from one processor in *some* ordering.

1. Consider the following access pattern on a two-processor system with a direct-mapped, write-back cache with one cache block and a two cache block memory. Assume the MOESI protocol is used, with write-back caches, write-allocate, and invalidation of other caches on write (instead of updating the value in the other caches).

<table>
<thead>
<tr>
<th>Time</th>
<th>After Operation</th>
<th>P1 cache state</th>
<th>P2 cache state</th>
<th>Memory @ 0 up to date?</th>
<th>Memory @ 1 up to date?</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>P1: read block 1</td>
<td>Exclusive (1)</td>
<td>Invalid</td>
<td>YES</td>
<td>YES</td>
</tr>
<tr>
<td>1</td>
<td>P2: read block 1</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>P1: write block 1</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>P2: write block 1</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>P1: read block 0</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>5</td>
<td>P2: read block 0</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>P1: write block 0</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>7</td>
<td>P2: read block 0</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>8</td>
<td>P2: write block 0</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>9</td>
<td>P1: read block 0</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

2. Consider if we run the following two loops in parallel (as two threads on two processors).

   ```java
   for(int i = 0; i < N; i+=2)  array[i] += 1;
   for(int j = 1; j < N; j+=2) array[j] += 2;
   ```

Would we expect more, less, or the same number of cache misses than if we were to run this serially (assume each processor has its own cache and all data is invalid to start with)?
Concurrency

1. Consider the following function:

   ```c
   void transferFunds(struct account *from,
                      struct account *to,
                      long cents) {
      from->cents -= cents;
      to->cents += cents;
   }
   ```

   a) What are some data races that could occur if this function called simultaneously from two (or more) threads on the same account? (Hint: if the problem isn’t obvious, translate the function into MIPS first.)

   b) How could you fix or avoid these races? Can you do this without hardware support?

2. A reader-writer lock is a lock which can either be obtained exclusively by one thread (a “write lock”) or shared by an arbitrary number of threads (who share a “read lock”). Consider implementing a reader-writer lock by choosing the following values for the lock (which will be one MIPS word):

   - 0: unlocked
   - positive number: read-locked; lock value is number of readers
   - -1: write-locked

   Write MIPS assembly implementations of write_lock, write_unlock, read_lock, and read_unlock. When the lock cannot be obtained, have your functions loop until it becomes free.
Combinational Logic

Convert to Boolean expressions (use truth tables):
NAND:
XOR:
XNOR:

How many different two-input logic gates are possible? How many n-input logic gates?
Hint: Think about the truth table.

Create an inverter (NOT gate) using only NAND gates. Hint: look at the truth tables for each.

Boolean Logic Tricks

- $1 + A = 1$
- $0B = 0$
- $A + \bar{A} = 1$
- $B\bar{B} = 0$
- $A + AB = A$
- $A + \bar{A}B = A + B$
- $(A + B)(A + C) = A + BC$

DeMorgan's Law:
\[
\overline{AB} = \overline{A} + \overline{B}
\]
\[
\overline{A + B} = \overline{A} \overline{B}
\]

Minimize the following Boolean expressions:

Standard: $(A + B)(A + \overline{B})C$

Grouping & Extra terms: $\overline{ABC} + \overline{ABC} + AB\overline{C} + A\overline{BC} + ABC + A\overline{BC}$

DeMorgan's: $\overline{A(\overline{BC} + BC)}$