# CS61C Summer 2015 Lab 7 - CPU Project Prep

## Setup

Copy the starter lab files as usual, from the directory: ~cs61c/labs/07

## Exercises

### Exercise 1: Fibonacci Numbers

So far we have built circuits that are either 1) purely combinational and require no clock or 2) have memory but run infinitely. In this exercise, we want to build a circuit that will compute the Nth Fibonacci number. As a quick review the Fibonacci numbers are defined by F0 = 0, F1 = 1, and Fi = Fi-1 + Fi-2.

1. Start with just the infinite Fibonacci computation. Hopefully this should be pretty intuitive based on previous exercises that you have done before. How many registers do you need? What arithmetic blocks do you need? Where should the output be attached? Make sure you figure out a way to properly initialize the registers for the computation to work.
2. Your sub-circuit will compute up to the 15th Fibonacci number and we will assume our input N > 0 is an unsigned number. How many input bits do you need and how many output bits do you need to represent the 15th Fibonacci number?
3. Now to prevent the circuit from computing the Fibonacci numbers to infinity, we will make use of the Enable bit (lower left) on registers. When unset or pulled high, the registers will continue to load their inputs on the rising edge of the CLK. If Enabled is pulled low, then the registers will NOT load their inputs. We need to create a signal that will properly go low when we want the circuit to stop computing Fibonacci numbers.
4. Create an additional part of the circuit that halts the original circuit after N computations. For this exercise you may find the following blocks useful: Counter (Memory), Comparator (Arithmetic), and Probe (Wiring). Make sure that it properly stops on the Nth Fibonacci number (watch for off-by-one errors) and that it actually stays there (run it for at least 20 clock cycles).
Your output should be in binary, but you can use a probe to display the value in decimal. Make sure you check the attributes of any counter and comparator you use.

Extension: With proper register initialization, you can get this circuit fib15 to work properly for N > 0. But register initialization is annoying and must be repeated every time you reset your circuit. With a clever addition, you can get this fib15 to work for N >= 0 without the need for register initialization (Hint: it involves a MUX).

#### Checkoff

• Show your TA your fib15 circuit and verify that for N = 15 after some time it will return Out = 610 indefinitely.

## Exercise 2: Pipelining

This next exercise will get you some hands-on practice with pipelining. Assume that on power-on, registers initially contain zeros.

Consider the following 2-input circuit. Its output is computed by multiplying the inputs and adding it to the current value. Say the propagation delay of a adder block is 50ns, the propagation delay of a multiplication block is 55 ns, and the clk-to-q delay of a register is 5ns. Calculate the maximum clock rate at which this circuit can operate. Assume that the register setup time is negligible, and that both inputs come from clocked registers that receive their data from an outside source.

#### Checkoff

• Show your TA the calculations you performed to find the maximum clock rate (non-pipelined).

We want to improve the throughput of this circuit, and let it operate at a higher clock rate. To do so, we will divide up the multiplication and addition into two different pipeline stages; in the first pipeline stage, we will perform the multiplication of the two inputs. In the second pipeline stage, we will add the product to the state.

Our definition of "correctness" will be simple: we will consider the sequence of outputs from this circuit "correct" iff it corresponds to the sequence of outputs the non-pipelined version would emit, potentially with some leading zeros. For example, if for some sequence of inputs the non-pipelined version emits [3,5,1,2,4, ...], a correct circuit might emit the sequence of outputs [0,3,5,1,2,4, ...] for that same sequence of inputs.

For your convenience and to help standardize check-offs, we are providing a starting point in the files pipeline.circ and ROMdata (use the links or pull from ~cs61c/labs/07/). In pipeline.circ, the sub-circuit Non-pipelined is set up exactly as the figure above. The main circuit is set up to produce the output sequence [3,5,1,2,4,-1,0,0,...] on the non-pipelined version of this circuit. It is also a handy example of how to use memory from a file. The ROM block should be initialized to the proper data, but if it is zero-ed out, right-click it and choose "Load image..." and select ROMdata.

Note that we need a register to hold the intermediate value of the computation between pipeline stages. This is a general theme with pipelines.

1. Complete the sub-circuit Pipelined. You will need to add a register to separate the multiplication and addition into 2 pipeline stages.
2. Calculate the maximum clock rate for the pipelined version of the circuit.
3. When we talked about pipelining in lecture, we discussed that if a computation depends on the output of a previous computation, it's difficult to pipeline them and we often need to insert a pipeline "bubble" (or several) to ensure that the output of the first computation is ready to be an input to the second. Explain why inserting such "bubbles" is unnecessary for this particular circuit.

#### Checkoff

• Show your TA the completed, pipelined circuit.
• Show your TA the calculations you performed to find the maximum clock rate (pipelined).
• Explain to your TA why bubbles are unnecessary in this circuit.

## Exercise 3: Multiply

You decide that you want to implement multiply using repeated addition (ex: you want to multiply 5*3, so you could perform 5*3 = 5 + 5 + 5). Since we will be using the output of a cycle in the next cycle (i.e. use some sort of feedback) in this repeated-adder-multiplier, we will need to use a register and will only perform one addition per cycle. Since this multiply circuit will be within our ALU we need to output not only the product of the two numbers, but a signal specifying whether or not the CPU needs to stall while we finish computing the product of the two numbers.

Open MultRepeatedAdder.circ and take a look at the mult circuit. The inputs A and B are expected to be multiplied via repeated addition and their product should be placed in the "Product" output pin. Before the product at the "Product" output pin displays the correct output, the pin "Stall" needs to be 1 so the ALU knows to stall its pipeline.

The approach we are going to take is the following:

To multiply A*B we will repeatedly add A, B # of times, (Ex: 3*4 = 3 + 3 + 3 + 3, we added +3 repeatedly four times)

When the RESET input is 1, we are going to setup our registers to begin performing the multiplication (this means zeroing counters, capturing the values of A and B into registers, etc.)

We will need a counter to know the number of times we have added A. Once this counter == B we are done. This counter will be implemented in the "CountUpTo" sub-circuit. This circuit has the number/limit that we want to count up to as an input and outputs a signal specifying whether or not we have reached that limit yet (note: the counter outputs just a boolean value, not the count itself).

In the mult circuit we will need to:

• Store the values of A and B into registers when the RESET bit is 1 (hint: think about the enable bit of a register). We store the input values because we will need to use those values across multiple cycles and we want to make sure those values don't change during that time.
• We need an adder with A and the current sum from a register as inputs. This register should be connected to the output value (Make sure the register resets at the "reset" signal and holds its value if we are not stalling).
• We will need a means of using the counter subcircuit we made to create a control signal that will determine if we should still be adding.
• Set the output pins "Product" and "Stall Pipeline" to their correct values. Note that you don't need to implement any efficiency tricks; it should always take B cycles for the value to be computed.

To correctly use this circuit to multiply you must set inputs A and B and have the clock ticking in the main circuit. Then poke the reset pin to 1 and leave it for a clock cycle (this should cause the counter to reset and inputs A and B to be saved into registers). Then poke the reset pin back to 0 and watch the output after each cycle be incremented by A.

TIPS: Take time to understand how a register's RESET and ENABLE pins function. When RESET is 1, the register will reset to 0. While the ENABLE input bit is explictly 0, the register will not change it's stored value even if the clock rises.