CS61cl Lab 21 - CPU Design: Sequencer and Control

Quiz: adding an instruction

The MIPS CPU in P&H Figure 5.17 implements eight instructions. It is obviously incomplete. However, almost all the missing instructions can make use of the existing structure. Last time we saw that to implement and instruction, we first identify the functional units—register file, ALU, PC, or memory—needed by the instruction. There must be data paths along which information can access these functional units. We then choose existing control signals, perhaps with extra gates, or add new control signals that route information appropriately. (The meaning of the control signals produced by the Control and the ALU control logic is given here.)

The table below gives the values of the control signals for the instructions that we saw last time.

instoperation Register TransferALUSrcALU opMemReadMemToRegMemWriteRegWriteRegDst
lwload word R[rt]=Mem[ R[rs]+SX(im16) ]; PC = PC+4 0
00
11010
swstore word Mem[ R[rs]+SX(im16) ] = R[rt]; PC = PC+4 1 000 x1
0x
beqbranch equal PC = R[rs] == R[rt] ? PC+4 + SX(im16) : PC+4 0 010 x00x
addadd
R[rd] = R[rs] + R[rt]; PC = PC+4 0 10
0 0 01


1
subsubtract R[rd] = R[rs] - R[rt]; PC = PC+4
andand R[rd] = R[rs] & R[rt]; PC = PC+4
oror R[rd] = R[rs] | R[rt]; PC = PC+4
sltset less than R[rd] = R[rs] < R[rt]; PC = PC+4
addi
add immediate












Complete the final row.

Instruction Sequencer

PC holds the address of the currently executing instruction.  This address is provided to the instruction memory to fetch that instruction.

    instruction = instMem[ PC ];

This portion takes place for all instructions.  The instruction is then decoded into control signals, as above, and executed by perfoming the associated register transfers on the data path.  In addition, each instruction must perform the register transfer to update the PC.  For sequential instructions, this is

  PC = PC + 4

Self-Test: Requirements of the Instruction Set

Take a look at the MIPS instruction set described in the book and on the green card.  What are all the updates to the PC that it requires?

Branch:             PC = PC + 4 or  PC = PC + 4 + BranchAddr
Jump:                PC = JumpAddr
Jump Register:   PC = R[rs]
Jump And Link: PC = JumpAddr

Take a look at Figure 5.17.  What are all the register transfers on the PC that the sequencer portion of the data path can perform? 

    PC = PC + 4 , or
    PC = PC + 4 + signExtend(Inst[15-0]) << 2

The one that takes place is determined by the select on the MUX in the upper write corner.  The later is selected if the instruction is
a branch and the zero output of the ALU is 1.  On this datapath the BEQ instruction is implemented by causing the ALU to perform
R[rs] - R[rt] and allowing the Zero condition to determine if the Branch is taken ( PC = PC + 4 + signExtend(Inst[15-0]) << 2). 

Brainstorm: Other Branches

Can you implement the BNE instruction on this datapath as is? If so, how? If not, what modification would you make so that you could implement that instruction?

Implementing j (Jump)

We move to the j (Jump) instruction. Adding it to the instruction set requires a different approach, since it uses neither registers nor the ALU nor memory access. Its sole effect on the CPU's state elements is to change the PC. (Recall that the new PC address is the concatenation of the top four bits of the current PC+4, bits 25-0 of the instruction, and 00.

Figure 5.24 shows what's needed:

The new wires and logic appear in solid black in Figure 5.24.

(Note: My copy of the book has a bug in the added portion of the diagram. Can you identify it.)

We can add the additional colums to our table to contain the control points for the sequencer.  Note that all the sequential instruction need to set both of these control
points to 0 so that the MUXes in the sequencer give us PC = PC + 4. 

instoperation Register TransferALUSrcALU opMemReadMemToRegMemWriteRegWriteRegDst
Br
J
lwload word R[rt]=Mem[ R[rs]+SX(im16) ]; PC = PC+4 0
00
11010
0
0
swstore word Mem[ R[rs]+SX(im16) ] = R[rt]; PC = PC+4 1 000 x1
0x
0
0
beqbranch equal PC = R[rs] == R[rt] ? PC+4 + SX(im16) : PC+4 0 010 x00x
1
0
addadd
R[rd] = R[rs] + R[rt]; PC = PC+4 0 10
0 0 01


1
0
0
subsubtract R[rd] = R[rs] - R[rt]; PC = PC+4
andand R[rd] = R[rs] & R[rt]; PC = PC+4
oror R[rd] = R[rs] | R[rt]; PC = PC+4
sltset less than R[rd] = R[rs] < R[rt]; PC = PC+4
addi
add immediate
R[rt] = R[rs] + SX(im16)
1
00
0
0
0
1
0
0
0
j
jump
PC = JumpAddr
x
x
x
x
0
0
x
0
1


Brainstorm: Other Jumps

Does this datapath have the capability to support Jump-and-Link?  Jump-Register?  If so how? If not, what would you need to add to be able to do that?

Implementing Control

Our tables provide a complete specification for the control logic.  We simply treat it as a truth table with the opcode (bits 31-26 of the instruction) as the input and the control points as the output.  For our subset of the MIPS this is as shown below.  Recall, that for R type instructions the opcode is 000000 and the actual function being performed by the ALU is determined by the FUNCT field.  The flow of information through the datapath is the same for all the R-type instructions - only the operation within the ALU changes.
inst Register TransferInst[31-26]
ALUSrcALU opMemReadMemToRegMemWriteRegWriteRegDst
Br
J
lw R[rt]=Mem[ R[rs]+SX(im16) ]; PC = PC+4 10 0011
0
00
11010
0
0
sw Mem[ R[rs]+SX(im16) ] = R[rt]; PC = PC+410 1011
1 000 x1
0x
0
0
beq PC = R[rs] == R[rt] ? PC+4 + SX(im16) : PC+4 00 0100
0 010 x00x
1
0
add R[rd] = R[rs] + R[rt]; PC = PC+400 0000
0 10
0 0 01


1
0
0
sub R[rd] = R[rs] - R[rt]; PC = PC+4
and R[rd] = R[rs] & R[rt]; PC = PC+4
or R[rd] = R[rs] | R[rt]; PC = PC+4
slt R[rd] = R[rs] < R[rt]; PC = PC+4
addi
R[rt] = R[rs] + SX(im16)
00 1000
1
00
0
0
0
1
0
0
0
j
PC = JumpAddr
00 0010
x
x
x
x
0
0
x
0
1

For this subset of the instruction set, give a boolean expression for ALUsrc, RegWrite, and J.

Critical Path

Recall, the cycle time of a digital design must be longer than the sum of the propagation delays along  the longest path from the output of a state element to the input of a state element.  Assuming that a memory access is 10 units, a register file access is 2, and ALU operation is 2, and all of the other MUXes, adders and control blobs are 1, which instruction has the longest critical path?  How long is that?

Multicycle Datapath

So far we have built a datapath that implements the entire  instruction cycle (instruction fetch, decode, operand fetch, execute, memory, result store) as one huge blob of combinational logic.  Everytime the clock ticks, PC gets updated, and register may get updated or memory may get updated.  What is the cycle time for such a design?  Huge!  We can reduce the cycle time by adding additional registers, and then each instruction will take multiple clock cycles.  For example, we can separate instruction fetch from instruction execution by placcing a register, the Instruction Register (IR) on the output of the instruction memory.  This means all instructions are implemented by two consecutive regsiter transfers:
Where in Figure 5.24 would you inster the IR register?  Right  on the outputs of the Instruction Memory.  But notice, that this creates a little bit of a problem.  The instruction executes in two clocks (fetch, execute) but the sequencer is updating PC every clock.  We need to place another register, nPC, on the output of the PC+4 adder.  We take turns updating one and then the other.  So our instruction execution is like this.
The controller would need to keep one bit of state indicating the stage of processing: instruction fetch or execute. We'll get more experience with this in our CAL16 implementation in Prohject 4.

Homework

Copy ~cs61cl/code/CAL16-template project to a filed called CAL16-name.circ

It already include built-in libraries

  - Memory
  - arithmetic
  - plexers
  - input/output
Build up some of the basic tools that you will need for your data path. You are given subcircuit templates for each of these. Do not change the orientation or relative placement of the input and output pins. Otherwise it will modify the symbol that appears in the main schematic.

We have provided you an implementation of the following.

Reg16 - 16-bit register with async clr and selective load.
  - Inputs: 
      D[15:0]
      ld
      clr
  - Outputs
	Q[15:0]
  - Internally clocked
Complete the following using the reg16 component.
RegFile - 16x16-bit register file, 2 read ports and 1 write port.  R0 always
reads as zero.  
  - Inputs
      D[15:0]
      Asel[4:0]
      Bsel[4:0]
      Csel[4:0]
      clr
  - Outputs
      A[15:0]
      B[15:0]
  - Internally clocked
Register 0 always reads a zero. If no reg write is desired, Dsel should be set to 0.

Complete the design of these two subcircuits and verify their correctness

Drop this in your main circuit as a starting point for your design and build around it. Make sure the component has the pin labels. Add text in the middle of the component so that it is clear what it is.

ALU Design

The next step is to desgin the ALU. The specification of the ALU is derived from the instruction set, but is also fairly general purpose
  - Inputs
      A[15:0]
      B[15:0]
      op[2:0]
      Cin
  - Outputs
      R[15:0]
      Cout

The functional behavior is given by the following table

op      operation
0 0 0	and	R = A & B
0 0 1	or      R = A | B
0 1 0	xnor    R = ~(A ^ B)
0 1 1	add	Cout:R = A + B + cin
1 0 0	passB   R = B
Drop your ALU into main. You can connect input and output pins to verify that it works. You can wire it to your ALU, but you will have to pull them apart again later.