Project 3: CS61CPU

Overview

Part A Deadline: Thursday, March 11, 2021

Part B Deadline: Friday, April 2, 2021

You’re probably curious about that “Sea Pea You” thing in your computer (if you’re not, let’s pretend you are for a second). How exactly does a CPU get electricity to run those sw ra, 40(sp) instructions you’ve been writing? It’s time to uncover another piece of that black box: welcome to Project 3!

In Part A (Tasks 1-3), you’ll be wiring up the ALU and RegFile for a basic RISC-V CPU, as well as implementing the CPU datapath for executing addi instructions. In Part B (Tasks 4-5), you’ll use these components (and others) to wire up a working CPU that runs actual RISC-V instructions!


Tips and Guidelines

  • You are only allowed to use Logisim’s built-in components from the following libraries for all parts of this project: Wiring (except Transistor, Transmission Gate, POR, Pull Resistor, Power, Ground, POR, Do not connect), Gates (except PLA), Plexers, Arithmetic (except Divider), Memory (except RAM, Random Generator).
  • Save frequently and commit frequently! Try to save your circuits every 5 minutes or so, and commit every time you produce a new feature, even if it is small.
  • .circ files use the XML format, which makes it hard for Git to automerge. We recommend working on a single computer at a time; if you use multiple computers, make sure that you have pushed and pulled your code before switching devices.
  • Do not move or edit the locked input/output pins in your circuits, since this could knock the pins out of alignment (in other words, autograder tests will fail)! Check the harness circuits, as well as cpu.circ, to make sure your circuits fit in the testing harnesses. You can add pins in subcircuits you create, but avoid making new pins in circuits with the provided, locked pins.
  • You may make new subcircuits, but they must be located in the .circ files given in the cpu/ folder in the starter code. You may not make new .circ files; the autograder will fail you if you do this!
  • You must use unique names for each subcircuit across all .circ files. Failing to do this will result in reduced autograder points. You may not change the names of any circuits provided in the starter files.
  • Sanity tests for most project tasks are included with the project starter code. More information is available under the Testing section of each task.
  • Your submission must be reasonably efficient: It should not take a very long time to run a single test.
  • We recommend completing Lab 05 before starting on Part A of the project and Lab 06 before starting on Part B of this project. Both labs cover many Logisim basics that will be useful for the respective parts of the project.
  • A Logisim library reference can be found by going to Help -> Library Reference within Logisim.

Some common sources of Logisim errors, for your debugging convenience:


Part A: Getting Started

Please visit https://galloc.cs61c.org/ and get your proj3 repository. Then, clone your repository locally and add the starter remote:

$ git clone YOUR_REPO_URL
$ cd YOUR_REPO_NAME
$ git remote add starter https://github.com/61c-teach/sp21-proj3-starter.git
$ git pull starter master

If we make changes to the starter code, you can update your repository with git pull starter master.

This project primarily uses Logisim; refer to Lab 5 if you haven’t set it up or need a refresher.


Task 1: Arithmetic Logic Unit (ALU)

Your first task is to create an ALU that supports all the operations needed by the instructions in our ISA (which is described in further detail in the next section). Please note that we treat overflow as RISC-V does with unsigned instructions, meaning that we ignore overflow.

We have provided a skeleton of an ALU for you in alu.circ (in the cpu folder). It has three inputs:

Input Name Bit Width Description
A 32 Data to use for Input A in the ALU operation
B 32 Data to use for Input B in the ALU operation
ALUSel 4 Selects which operation the ALU should perform (see the list of operations with corresponding switch values below)

… and one output:

Output Name Bit Width Description
Result 32 Result of the ALU operation

Below is the list of ALU operations for you to implement, along with their associated ALUSel values. All of them are required. You are allowed and encouraged to use built-in Logisim components to implement the arithmetic operations.

ALUSel Value Instruction
0 add: Result = A + B
1 sll: Result = A << B
2 slt: Result = (A < B (signed)) ? 1 : 0
3 Unused
4 xor: Result = A ^ B
5 srl: Result = (unsigned) A >> B
6 or: Result = A | B
7 and: Result = A & B
8 mul: Result = (signed) (A * B)[31:0]
9 mulh: Result = (signed) (A * B)[63:32]
10 Unused
11 mulhu: Result = (A * B)[63:32]
12 sub: Result = A - B
13 sra: Result = (signed) A >> B
14 Unused
15 bsel: Result = B

The ALU tests for Part A only use ALUSel values with defined instructions, so your design doesn’t need to worry about the unused values.

You can make any modifications to alu.circ you want, but the behavior must match the specification above. If you create additional subcircuits for your ALU, they must also be in alu.circ (you may not make new .circ files). Additionally, your ALU must be able to fit in the provided harness alu-harness.circ. This means that you should take care not to edit the provided input/output pins or add new ones. To verify that changes you made didn’t break anything, you can open alu-harness.circ and ensure there are no errors and that the circuit functions well.

Tips

  • add is already made for you; feel free to use a similar structure when implementing your other operations.
  • If you want to know more details about each component, go to Help -> Library Reference for more information on the component and its inputs and outputs.
  • You might find bit splitters or extenders useful when implementing shift operations.
  • When using a component, it might help to check its Library Reference page. You can also hover your cursor over an input/output on a component to get slightly more information about that input/output. For example, the Multiplier component has a Carry Out output, which might be useful for multiply operations.
  • Use tunnels! They will make your wiring cleaner and easier to follow, and will reduce your chances of encountering crossed wires or unexpected errors.
  • A multiplexer (MUX) might be useful when deciding between operation outputs. In other words, consider simply processing the input for all operations, and then outputing the one of your choice.

Info: Testing

We’ve provided some sanity tests for each task, located in subdirectories under tests/. For example, the ALU tests are in tests/part-a/alu/. When tests are run, the outputs from your circuits are saved in a student-output/ subdirectory.

For example, to run the ALU tests:

$ python3 test.py tests/part-a/alu/

You can also specify a single test circuit, or a grandparent/great-grandparent directory:

$ python3 test.py tests/part-a/alu/alu-add.circ
$ python3 test.py tests/part-a/
$ python3 test.py tests/

After the tests finish running, your ALU circuit’s outputs will be saved under tests/part-a/alu/student-output/ with a -student.out suffix (e.g. alu-add-student.out). The corresponding reference outputs can be found at tests/part-a/alu/reference-output/ with a -ref.out suffix (e.g. alu-add-ref.out).

We’ve also provided format-output.py, which accepts a path to an output file and displays the output in a more readable format (left-aligned hexadecimal numbers). For example, to get the reference output of the alu-add sanity test in readable format, you would do:

$ python3 tools/format-output.py tests/part-a/alu/reference-output/alu-add-ref.out

If you want to see the difference between your output and the reference solution, put the readable outputs into temporary files and diff them. For example, for the alu-add test, you would do:

$ python3 tools/format-output.py tests/part-a/alu/reference-output/alu-add-ref.out > reference.out
$ python3 tools/format-output.py tests/part-a/alu/student-output/alu-add-student.out > student.out
$ diff reference.out student.out

Note: if the lines are wrapping, try resizing your terminal window (or try a slightly smaller font). Or see the following note.

Experimental note: each output file is technically a valid CSV file, so you can also import the output in a spreadsheet app if you really want to crunch the numbers (or you really hate tables in terminal). If the app requires a .csv extension, you can cp tests/part-a/alu/student-output/alu-add-student.out student.csv and import the resulting .csv file.

Inspecting Tests

Similar to how you can step through your C code in GDB, you can also step through the test circuits in Logisim! For this example we’ll be using the alu-add test.

Open tests/part-a/alu/alu-add.circ in Logisim. There are, among other things, some ROMs feeding into the Input_A, Input_B, and ALUSel tunnels, which then feed into your ALU near the upper right. Every clock cycle, the adder on the top left increments by one, which advances the output from the ROMs by one entry and feeds a new set of inputs to your ALU. If you tick the circuit a couple times (File -> Tick Full Cycle or the corresponding keyboard shortcut), you can see the test circuit advance through each set of inputs and your ALU’s corresponding outputs. If you want to start over, use Simulate -> Reset Simulator (or the keyboard shortcut).

Now, let’s see what your ALU is actually doing with the inputs. Right-click your ALU, and select View alu. Your ALU circuit will appear, with the input values for the current test cycle already on the ALU input pins. With this, you can see exactly what your ALU is doing in every line from the output files! The Poke Tool will be very useful here.

Note: edits to the test circuit, including the ALU we just inspected, will not be saved. Avoid making edits in the test circuit, as they may be lost!


Task 2: Register File (RegFile)

As you learned in class, RISC-V architecture has 32 registers. For this project, we will implement all of them. To aid in debugging and testing, we have written the RegFile to expose the 8 registers specified below. Please make sure that the values of these register are attached to the proper outputs.

Your RegFile should be able to write to or read from these registers specified in a given RISC-V instruction without affecting any other registers. There is one notable exception: your RegFile should NOT write to x0, even if an instruction tries. Remember that the zero register should ALWAYS have the value 0x0. You should NOT gate the clock at any point in your RegFile: the clock signal should ALWAYS connect directly to the clock input of the registers without passing through ANY combinational logic.

The exposed registers and their corresponding numbers are listed below.

Register Number Register Name
x1 ra
x2 sp
x5 t0
x6 t1
x7 t2
x8 s0
x9 s1
x10 a0

You are provided with the skeleton of a register file in regfile.circ. The register file circuit has six inputs:

Input Name Bit Width Description
Clock 1 Input providing the clock. This signal can be sent into subcircuits or attached directly to the clock inputs of memory units in Logisim, but should not otherwise be gated (i.e., do not invert it, do not AND it with anything, etc.).
RegWEn 1 Determines whether data is written to the register file on the next rising edge of the clock.
rs1 (Source Register 1) 5 Determines which register’s value is sent to the Read_Data_1 output, see below.
rs2 (Source Register 2) 5 Determines which register’s value is sent to the Read_Data_2 output, see below.
rd (Destination Register) 5 Determines which register to write the value of Write Data to on the next rising edge of the clock, assuming that RegWEn is a 1.
wb (Write Data) 32 Determines what data to write to the register identified by the Destination Register input on the next rising edge of the clock, assuming that RegWEn is 1.

The register file also has the following outputs:

Output Name Bit Width Description
Read_Data_1 32 Driven with the value of the register identified by the Source Register 1 input.
Read_Data_2 32 Driven with the value of the register identified by the Source Register 2 input.
ra Value 32 Always driven with the value of ra (This is a DEBUG/TEST output.)
sp Value 32 Always driven with the value of sp (This is a DEBUG/TEST output.)
t0 Value 32 Always driven with the value of t0 (This is a DEBUG/TEST output.)
t1 Value 32 Always driven with the value of t1 (This is a DEBUG/TEST output.)
t2 Value 32 Always driven with the value of t2 (This is a DEBUG/TEST output.)
s0 Value 32 Always driven with the value of s0 (This is a DEBUG/TEST output.)
s1 Value 32 Always driven with the value of s1 (This is a DEBUG/TEST output.)
a0 Value 32 Always driven with the value of a0 (This is a DEBUG/TEST output.)

The test outputs at the top of your regfile circuit are present for testing and debugging purposes. If you were implementing a real register file, you would omit those outputs. In our case, be sure they are included correctly – if they are not, you will not pass tests.

You can make any modifications to regfile.circ you want, but the behavior must match the specification above. If you create additional subcircuits to use in your RegFile, they must also be in regfile.circ (you may not make new .circ files). Additionally, your RegFile must be able to fit in the provided harness regfile-harness.circ. This means that you should take care not to edit the provided input/output pins or add new ones. To verify that changes you made didn’t break anything, you can open regfile-harness.circ and ensure there are no errors and that the circuit functions well.

Tips

  • Take advantage of copy-paste! It might be a good idea to make one register completely and use it as a template for the others to avoid repetitive work.
  • MUXes will be helpful. DeMUXes may also be helpful.
  • We recommend that you not to use the Enable input on your MUXes. In fact, you can turn that attribute off (Include Enable?). We also recommend that you disable the Three-state? attribute (if the plexer has it).
  • Open up the Library Reference page for the Register component, and check out all the input/output pins. The Enable pin may come in handy.
  • Think about what happens in the register file after a single instruction is executed. Which values change? Which values stay the same? Registers are clock-triggered – what does that mean?
  • What is the value of x0?

RegFile Testing

We’ve provided some RegFile sanity tests in the tests/part-a/regfile/ directory. You can run these with:

$ python3 test.py tests/part-a/regfile/

Refer to the Info: Testing section for more info on test outputs.


Task 3: The addi Instruction

As your final task for Part A, you’re going to implement a CPU that’s capable of executing one instruction: addi!

Note: we’ll be implementing other instructions in Part B. You’re welcome to implement other instructions at this time, but you’ll only be graded on whether or not addi executes correctly for Part A, so make sure that addi works!

Info: Memory

The Memory unit (located in mem.circ) is already fully implemented for you and attached to the outputs of your CPU in cpu-harness.circ! The addi instruction does NOT use Data Memory, so for Part A you can ignore the DMEM and leave its I/O pins undriven.

If you are interested, here’s a quick summary of its inputs and outputs:

Signal Name Direction Bit Width Description
WriteAddr Input 32 Address to read/write to in Memory
WriteData Input 32 Value to be written to Memory
Write_En Input 4 The write mask for instructions that write to Memory and zero otherwise
CLK Input 1 Driven by the clock input to the CPU
ReadData Output 32 Value of the data stored at the specified address

Info: Branch Comparator

The Branch Comparator unit (located in branch-comp.circ) provided in the skeleton is unimplemented, but the addi instruction does NOT use the Branch Comparator unit, so you don’t have to worry about it for Part A.

If you are interested, here’s a quick summary of its inputs and outputs:

Signal Name Direction Bit Width Description
rs1 Input 32 Value in the first register to be compared
rs2 Input 32 Value in the second register to be compared
BrUn Input 1 Equal to one when an unsigned comparison is wanted, or zero when a signed comparison is wanted
BrEq Output 1 Equal to one if the two values are equal
BrLt Output 1 Equal to one if the value in rs1 is less than the value in rs2

Info: Immediate Generator

The Immediate Generator (“Imm Gen”) unit (located in imm-gen.circ) provided in the skeleton is unimplemented. The addi instruction requires an immediate generator, but for now you can hard-wire it to construct the immediate for the addi instruction, without worrying about other immediate types.

To edit this subcircuit, edit the imm-gen.circ file and not the imm_gen in cpu.circ. Note that if you modify this circuit, you will need to close and re-open cpu.circ to load the changes in your CPU.

Here’s a quick summary of its inputs and outputs:

Signal Name Direction Bit Width Description
inst Input 32 The instruction being executed
ImmSel Input 3 Value determining how to reconstruct the immediate
imm Output 32 Value of the immediate in the instruction

Info: Processor

We have provided a skeleton for your processor in cpu.circ. You will be using your own implementations of the ALU and RegFile as you construct your datapath. You are responsible for constructing the entire datapath from scratch. For Part A, your completed processor should support executing the addi instruction in a single cycle (i.e. no pipelining). In Part B, we’ll modify your CPU to use a 2-stage pipeline, with IF in the first stage and ID, EX, MEM, and WB in the second stage.

Your processor will sit in a processor harness cpu-harness.circ that contains the Memory unit. That processor harness then sits in a testing harness run.circ that provides the instructions to the processor. Your process will output the address of an instruction, and accept the instruction at that address as an input. It will also output the data memory address, data memory write enable, and accept the data at that address as an input. Essentially, these two test harnesses are your data memory and instruction respectively. We recommend that you take some time to inspect cpu-harness.circ and run.circ to see exactly what’s going on. cpu-harness.circ will be used in the tests provided to you for sanity checking, so make sure your CPU fits in the harness before testing and submitting your work! Your processor has 3 inputs that come from the harness:

Input Name Bit Width Description
READ_DATA 32 Driven with the data at the data memory address identified by the WRITE_ADDRESS (see below).
INSTRUCTION 32 Driven with the instruction at the instruction memory address identified by the FETCH_ADDRESS (see below).
CLOCK 1 The input for the clock. As with the register file, this can be sent into subcircuits (e.g. the CLK input for your register file) or attached directly to the clock inputs of memory units in Logisim, but should not otherwise be gated (i.e., do not invert it, do not AND it with anything, etc.).

Your processor must provide the following outputs to the first level harness:

Output Name Bit Width Description
ra 32 Driven with the contents of ra (FOR TESTING)
sp 32 Driven with the contents of sp (FOR TESTING)
t0 32 Driven with the contents of t0 (FOR TESTING)
t1 32 Driven with the contents of t1 (FOR TESTING)
t2 32 Driven with the contents of t2 (FOR TESTING)
s0 32 Driven with the contents of s0 (FOR TESTING)
s1 32 Driven with the contents of s1 (FOR TESTING)
a0 32 Driven with the contents of a0 (FOR TESTING)
tohost 32 Driven with the contents of CSR 0x51E (FOR TESTING, for Part A leave it as-is)
WRITE_ADDRESS 32 This output is used to select which address to read/write data from in data memory.
WRITE_DATA 32 This output is used to provide write data to data memory.
WRITE_ENABLE 4 This output is used to provide the write enable mask to data memory.
PROGRAM_COUNTER 32 This output is used to select which instruction is presented to the processor on the INSTRUCTION input.

Just like with the ALU and RegFile, make sure that you do not edit the input/output pins or add new ones!


Info: Control Logic

The Control Logic unit (control-logic.circ) provided in the skeleton is unimplemented. Designing your control logic unit will probably be your biggest challenge in Part B. For Part A, you can put a constant for each control signal, because addi is the only instruction you’ll be implementing. As you implement addi, think about where you’ll need to make additions in order to support other instructions.

To edit this subcircuit, edit the control-logic.circ file and not the control_logic in cpu.circ. Note that if you modify this circuit, you will need to close and re-open cpu.circ to load the changes in your CPU.

You are welcome to add more input/output pins to the starter control logic as your design demands. You may also use as many or as few of the supplied ports as needed. However, please do not edit any of the existing pins during this process.


Single Stage CPU: A Guide

We know that trying to build a CPU with a blank slate might be intimidating, so we wrote the following guide to help you.

Recall the five stages of the CPU pipeline:

  1. Instruction Fetch (IF)
  2. Instruction Decode (ID)
  3. Execute (EX)
  4. Memory (MEM)
  5. Write Back (WB)

This guide will help you work through each of these stages for the addi instruction. Each section will contain questions for you to think through and pointers to important details, but it won’t tell you exactly how to implement the instruction.

You may need to read and understand each question before going to the next one, and you can see the answers by clicking on the question. During your implementation, feel free to place things in subcircuits as you see fit.

Stage 1: Instruction Fetch

The main thing we are concerned about in this stage is: how do we get the current instruction? From lecture, we know that instructions are stored in the instruction memory, and each of these instructions can be accessed through an address.

1. Which file in the project holds your instruction memory? How does it connect to your `cpu.circ` file? The instruction memory is the ROM module in `run.circ`. It provides an input into your CPU named `INSTRUCTION` and takes an output from your CPU, called `PROGRAM_COUNTER`.
2. In your CPU, how would changing the address you output as `PROGRAM_COUNTER` affect the instruction input? The instruction that `run.circ` outputs to your CPU should be the instruction at address `PROGRAM_COUNTER` in instruction memory.
3. How do you know what `PROGRAM_COUNTER` should be? `PROGRAM_COUNTER` is the address of the current instruction being executed, so it is saved in the PC register. For this project, your PC will start at 0, as that is the default value for a register.
4. For basic programs without any jumps or branches, how will the PC change from line to line? The PC must increment by 1 instruction in order to go to the next instruction, as the address held by the PC register represents what instruction to execute. This means that your PC will typically increase by 4 (assuming no branch or jump) line to line.


In cpu.circ, we have provided a simple PC register implementation - ignoring jumps and branches. You will implement branches and jumps in Part B of the project, but for now we are only concerned with being able to run addi instructions.

Remember that we will eventually implement a 2-stage pipelined processor, so the IF stage is separate from the remaining stages. What circuitry separates the different stages of a pipeline? Specifically, what circuitry separates IF from the next stage? Will you need to add anything?

Stage 2: Instruction Decode

Now that we have our instruction coming from the instruction input, we break it down in the Instruction Decode step according to the RISC-V instruction formats you have learned.

1. What type of instruction is addi? What are the different bit fields and which bits are needed for each? I type. The fields are: - `imm [31-20]` - `rs1 [19-15]` - `funct3 [14-12]` - `rd [11-7]` - `opcode [6-0]`
2. In Logisim, what tool would you use to split out different groups of bits? The Splitter!

3. Implement the instruction field decode stage using the instruction input. You should use tunnels to label and group the bits.

4. Now we need to get the data from the corresponding registers, using the register file. Which instruction fields should be connected to the register file? Which inputs of the register file should it connect to? Instruction field `rs1` will need to connect to read register 1.

5. Implement reading from the register file. You will have to bring in your RegFile from Part A. Remember to connect the clock!

6. What does the Immediate Generator need to do? For addi, the immediate generator takes in 12 bits from the instruction and produces a signed 32-bit immediate. You will need to implement this logic in the Immediate Generator subcircuit!

Stage 3: Execute

The Execute stage is where the computation of most instructions is performed. This is also where we will introduce the idea of using a Control Module.

1. For the add instruction, what should be your inputs to the ALU? Read Data 1 (rs1) and the immediate produced by the Immediate Generator.
2. In the ALU, what is the purpose of ALUSel? It determines which operation the ALU will perform.
3. Although it is possible for now to just put a constant as the ALUSel, why would this be infeasible as you implement more instructions? With more instructions, the input to the ALU might need to change, so you will need to have some sort of circuit that changes ALUSel depending on the instruction being executed.

4. Bring in your ALU and connect the ALU inputs correctly. Do you need to connect the clock? Why or why not?

Stage 4: Memory

The memory stage is where the memory can be written to using store instructions and read from using load instructions. Because the addi instruction does not use memory, we will not spend too much time here.

At this point, we cannot connect most of the inputs, as we don’t know where they should come from.

Stage 5: Write Back

The write back stage is where the results of the operation is saved back to the registers.

1. Do `addi` instructions need to write back to a register? Yes. `addi` takes the output of a an addition computation in the ALU and writes it back to the register file.

2. Let's create the write back phase so that it is able to write both ALU and MEM outputs to the Register File. Later, when you implement branching/jumping, you may need to add more to this mux. However, at the moment, we need to choose between the ALU and MEM outputs, as only one wire can end up being an input to the register file. Bring a wire from both the ALU and `READ_DATA`, and connect it to a MUX.

3. What should you use as the Select input to the MUX? What does the input depend on? This input should be able to choose between three MUX inputs: (1) ALU, (2) MEM [`READ_DATA`], and (3) PC + 4 (when will you use this?) The control signal that determines which of these inputs is written back is called WBSel. For now, there should only be one value that WBSel can take on -- whatever it should be for `addi`.
4. Now that we have the inputs to the MUX sorted out, we need to wire the output. Where should the output connect to? Because the output is the data that you want to write into the Register File, it should connect to the Write Data input on the Register File.

5. There are two more inputs on the Register File which are important for writing data: RegWEn and rd. One of these will come from the Instruction Decode stage and the other one will be a new control signal that you need to design for Part B. Please finish off the Writeback stage by these inputs on the RegFile correctly.


Info: CPU Testing

In the Info: Testing section, we got to know the general folder structure of the tests, and understand the commands involved in running tests and interpreting output. Now, let’s look deeper into the CPU tests that you’ll be working with for the remainder of the project.

Understanding CPU Tests

Each CPU test is a copy of the run.circ file included with the starter code that has instructions loaded into its IMEM. When you run Logisim from the command line, the clock ticks, the program counter is incremented, and the values in each of the outputs is printed to stdout.

Let’s take as the single-cycle cpu-addi-basic sanity test as an example. It has 4 addi instructions (see tests/part-a/addi/cpu-addi-basic.s). Open tests/part-a/addi/cpu-addi-basic.circ in Logisim, and take a closer look at the various parts of the test file. At the top, you’ll see the place where your CPU is connected to the test outputs. With the starter code, you’ll see lots of UUUUs or XXXXs; when your CPU is working, this should not be the case. Your CPU takes in one input (INSTRUCTION), and along with the values in each of the registers, it has an additional output: PROGRAM_COUNTER, or the address of the instruction to be fetched from IMEM to be executed the next clock cycle.

As you can see, there are many specifically-positioned wires connected to specific input/output pins on your CPU. Make sure that you do not edit the provided input/output pins or add new ones, as this will change the shape of the CPU circuit, and as a result the connections in the test files may no longer work properly.

Below the CPU, you’ll see instruction memory. The hex for the addi instructions has been loaded into instruction memory. Instruction memory takes in one input (called PROGRAM_COUNTER) and outputs the instruction at that address. PROGRAM_COUNTER is a 32-bit value, but because Logisim caps the size of ROM units at 2^16 bytes, we have to use a splitter to get only 14 bits from PROGRAM_COUNTER (ignoring the bottommost two bits). Notice that PROGRAM_COUNTER is a byte address, not a word address.

So what happens when the clock ticks? Each tick of the clock increments an input in the test file called Time_Step. The clock will continue to tick until Time_Step is equal to the halting constant for that test file (for this particular test file, the halting constant is 5). At that point, the Logisim command line will print the values in each of the outputs to stdout. Our tests will compare this output to the expected; if your output is different, you will fail the test.

addi Tests (Part A)

We’ve provided some sanity tests for addi (Task 3) in the tests/part-a/addi/ directory. You can run these with:

$ python3 test.py tests/part-a/addi/

See above for more info on working with these tests.


Task 3.5: Part A README Update

Your last task for Part A is to fill in the README.md. Write down how you implemented your circuits and components for this part (including ALU and RegFile, since you used them for addi!), and explain the reasoning behind your design choices. There’s no specific format or length requirement here, so feel free to get creative!

Part A: Submission

At this point, if you’ve completed tasks 1-3.5, you’ve finished Part A of the project!

The autograder for Part A uses the same tests as the test files provided in the starter code. In other words, there are no hidden tests for Part A.

For addi tests, the autograder accepts either a single-cycle or a pipelined CPU. This is for cases where you start work in Part B, pipeline your CPU, and then realize you want to resubmit to Part A.

Double-check that you have not edited your input/output pins, and that your circuits fit in the provided testing harnesses. Make sure that you did not create any additional .circ files; the autograder will only be testing the circuit files you were allowed to edit for Part A (alu.circ, branch-comp.circ, control-logic.circ, cpu.circ, imm-gen.circ, and regfile.circ). Then, submit your repo to the Project 3A assignment on Gradescope.

The rest of this spec describes the tasks for Part B.


Part B: Getting Started

Please pull the latest updates from the starter code.

Task 4: More Instructions

In Task 3, you wired up a basic single-cycle CPU capable of executing addi instructions. Now, you’ll implement support for more instructions!

Your CPU will end up with many parts – try breaking it down into chunks, and start with the simple ones first. How you approach this task is entirely up to you, but one suggested starting order is: branch comparator and immediate generator (simpler than control logic/datapath), then I-type calculation instructions (since you’ve already implemented addi), then R-type calculation instructions (since there’s some overlap with I-type calculation instructions), and so on.

We strongly recommend that you work on Task 4 and Task 5 (Custom Tests) together, since incremental testing will help you catch bugs much faster. Write tests for an instruction (or group of instructions) you plan to implement, then implement the instruction(s) while using your tests as a reference. Once your tests pass, commit your changes so you can come back later if you find a regression, and move on to another instruction.

The Instruction Set Architecture (ISA)

We will be grading your CPU implementation on only the instructions listed below. Your CPU must support these instructions, but feel free to implement any additional instructions you want as long as they don’t affect your implementation of the required instructions.

Instruction Type Opcode Funct3 Funct7/Immediate Operation
add rd, rs1, rs2 R 0x33 0x0 0x00 R[rd] ← R[rs1] + R[rs2]
mul rd, rs1, rs2 0x0 0x01 R[rd] ← (R[rs1] * R[rs2])[31:0]
sub rd, rs1, rs2 0x0 0x20 R[rd] ← R[rs1] - R[rs2]
sll rd, rs1, rs2 0x1 0x00 R[rd] ← R[rs1] << R[rs2]
mulh rd, rs1, rs2 0x1 0x01 R[rd] ← (R[rs1] * R [rs2])[63:32]
mulhu rd, rs1, rs2 0x3 0x01 (unsigned) R[rd] ← (R[rs1] * R[rs2])[63:32]
slt rd, rs1, rs2 0x2 0x00 R[rd] ← (R[rs1] < R[rs2]) ? 1 : 0 (signed)
xor rd, rs1, rs2 0x4 0x00 R[rd] ← R[rs1] ^ R[rs2]
srl rd, rs1, rs2 0x5 0x00 (unsigned) R[rd] ← R[rs1] >> R[rs2]
sra rd, rs1, rs2 0x5 0x20 (signed) R[rd] ← R[rs1] >> R[rs2]
or rd, rs1, rs2 0x6 0x00 R[rd] ← R[rs1] | R[rs2]
and rd, rs1, rs2 0x7 0x00 R[rd] ← R[rs1] & R[rs2]
lb rd, offset(rs1) I 0x03 0x0 R[rd] ← SignExt(Mem(R[rs1] + offset, byte))
lh rd, offset(rs1) 0x1 R[rd] ← SignExt(Mem(R[rs1] + offset, half))
lw rd, offset(rs1) 0x2 R[rd] ← Mem(R[rs1] + offset, word)
addi rd, rs1, imm 0x13 0x0 R[rd] ← R[rs1] + imm
slli rd, rs1, imm 0x1 0x00 R[rd] ← R[rs1] << imm
slti rd, rs1, imm 0x2 R[rd] ← (R[rs1] < imm) ? 1 : 0
xori rd, rs1, imm 0x4 R[rd] ← R[rs1] ^ imm
srli rd, rs1, imm 0x5 0x00 R[rd] ← R[rs1] >> imm
srai rd, rs1, imm 0x5 0x20 R[rd] ← R[rs1] >> imm
ori rd, rs1, imm 0x6 R[rd] ← R[rs1] | imm
andi rd, rs1, imm 0x7 R[rd] ← R[rs1] & imm
sb rs2, offset(rs1) S 0x23 0x0 Mem(R[rs1] + offset) ← R[rs2][7:0]
sh rs2, offset(rs1) 0x1 Mem(R[rs1] + offset) ← R[rs2][15:0]
sw rs2, offset(rs1) 0x2 Mem(R[rs1] + offset) ← R[rs2]
beq rs1, rs2, offset SB 0x63 0x0 if(R[rs1] == R[rs2])
 PC ← PC + {offset, 1b0}
bne rs1, rs2, offset 0x1 if(R[rs1] != R[rs2])
 PC ← PC + {offset, 1b0}
blt rs1, rs2, offset 0x4 if(R[rs1] < R[rs2] (signed))
 PC ← PC + {offset, 1b0}
bge rs1, rs2, offset 0x5 if(R[rs1] >= R[rs2] (signed))
 PC ← PC + {offset, 1b0}
bltu rs1, rs2, offset 0x6 if(R[rs1] < R[rs2] (unsigned))
 PC ← PC + {offset, 1b0}
bgeu rs1, rs2, offset 0x7 if(R[rs1] >= R[rs2] (unsigned))
 PC ← PC + {offset, 1b0}
auipc rd, offset U 0x17 R[rd] ← PC + {offset, 12b0}
lui rd, offset 0x37 R[rd] ← {offset, 12b0}
jal rd, imm UJ 0x6f R[rd] ← PC + 4
 PC ← PC + {imm, 1b0}
jalr rd, rs1, imm I 0x67 0x0 R[rd] ← PC + 4
 PC ← R[rs1] + {imm}
csrw rd, csr, rs1 I 0x73 0x1 CSR[csr] ← R[rs1]
csrwi rd, csr, uimm 0x5 CSR[csr] ← {uimm}

Info: Branch Comparator

The Branch Comparator unit (located in branch-comp.circ) compares two values and outputs control signals that will be used to make branching decisions. You will need to implement logic for this circuit.

To edit this subcircuit, edit the branch-comp.circ file and not the branch_comp in cpu.circ. Note that if you modify this circuit, you will need to close and re-open cpu.circ to load the changes in your CPU.

Again, here’s a quick summary of its inputs and outputs:

Signal Name Direction Bit Width Description
rs1 Input 32 Value in the first register to be compared
rs2 Input 32 Value in the second register to be compared
BrUn Input 1 Equal to one when an unsigned comparison is wanted, or zero when a signed comparison is wanted
BrEq Output 1 Equal to one if the two values are equal
BrLt Output 1 Equal to one if the value in rs1 is less than the value in rs2

Info: Immediate Generator

The Immediate Generator (“Imm Gen”) unit (located in imm-gen.circ) extracts the appropriate immediate from I, S, B, U, and J type instructions. Remember that in RISC-V, all immediates that leave the immediate generator are 32-bits and sign-extended! See the table below for how each immediate should be formatted:

We don’t specify an encoding for ImmSel (i.e. come up with your own!), but make sure that your Immediate Generator and Control Logic use the same values for ImmSel.

To edit this subcircuit, edit the imm-gen.circ file and not the imm_gen in cpu.circ. Note that if you modify this circuit, you will need to close and open cpu.circ to load the changes in your CPU.

Again, here’s a quick summary of its inputs and outputs:

Signal Name Direction Bit Width Description
inst Input 32 The instruction being executed
ImmSel Input 3 Value determining how to reconstruct the immediate
imm Output 32 Value of the immediate in the instruction

Info: Control Logic

The Control Logic unit (control-logic.circ) provided in the skeleton is based on the control logic unit in the 5-stage CPU used in lecture and discussion. In order to correctly identify each instruction, control signals play a very important part in this project. However, figuring out all of the control signals may seem intimidating. We suggest taking a look at the lecture slides and discussion worksheets to get started. Try walking through the datapath with different types of instructions; when you see a MUX or other component, think about what selector/enable value you will need for that instruction.

You are welcome to add more inputs or outputs to the existing starter control_logic circuit as your control logic demands. You may also use as many or as few of the supplied ports as needed. That being said, please do not edit, move, or remove any of the existing ports during this process.

There are a two major approaches to implementing the control logic so that it can extract the opcode/funct3/funct7 from an instruction and set the control signals appropriately.

The recommended method is hard-wired control, as discussed in lecture, which is usually the preferred approach for RISC architectures like MIPS and RISC-V. Hard-wired control uses various gates and other components (remember, we’ve learned how components like MUXes can be built from AND/OR/NOT gates) to produce the appropriate control signals. An instruction decoder takes in an instruction and outputs all of the control signals for that instruction.

The other way to do it is to use ROM control. Every instruction implemented by a processor maps to an address in a Read-Only Memory (ROM) unit. At that address in the ROM is the control word for that instruction. An address decoder takes in an instruction and outputs the address of the control word for that instruction. This approach is common in CISC architectures like Intel’s x86-64, and, in real life, offers some flexibility because it can be re-programmed by changing the contents of the ROM.

To edit this subcircuit, edit the control-logic.circ file and not the control_logic in cpu.circ. Note that if you modify this circuit, you will need to close and open cpu.circ to load the changes in your CPU.

Tips

  • If you’re a spreadsheet kind of person, a spreadsheet might help you organize your control logic!
  • Hard-wired control: for signals like ALUSel, where you might want to output a certain number depending on multiple potential input signals, a Priority Encoder might be helpful!

Info: Memory

The Memory unit (located in mem.circ) is already fully implemented for you and attached to the outputs of your CPU in cpu-harness.circ! You must not add mem.circ into your CPU; doing so will cause the autograder to fail and you will not receive a score.

Due to Logisim limitations, only the lower 16 bits of the memory address are used, and the upper 16 bits are discarded. Therefore, memory (IMEM and DMEM) in this project has an effective address space of 2^16 byte addresses.

While the address you give to memory is a byte address, the memory unit returns an entire word. The memory unit ignores the bottom two bits of the address you provide to it, and treats its input as a word address rather than a byte address. For example, if you input the 32-bit address 0x0000_1007, it wil be treated as the word address 0x0000_1004, and the output will be the 4 bytes at addresses 0x0000_1004, 0x0000_1005, 0x0000_1006, and 0x0000_1007.

Note that for the lh, lw, sh, sw instructions, the RISC-V ISA supports unaligned accesses, but implementing them is complicated. We’ll only be implementing aligned memory accesses in this project. This means that operations will only be defined when they do not exceed the boundaries of a contiguous word in memory. An example of such an operation is any lw or sw that operates on an address that is a multiple of 4. Since the address is a multiple of 4 and we load 4 bytes in a word, the total memory fetched does not exceed the boundaries of a contiguous word in memory. You must not implement unaligned accesses; you would likely need to use stalling, which would result in your output not matching our expected output (bad for your score).

Remember that the memory is also byte level write enabled. This means that the Write_En signal is 4 bits wide and acts as a write mask for the input data (i.e. each bit of the mask enables writing to the corresponding byte of the word). Some examples (W = byte will be overwritten, blank = byte is unaffected):

Write_En Byte 0 Byte 1 Byte 2 Byte 3 (most significant)
0b1000       W
0b0101 W   W  
0b0000        
0b1111 W W W W

The ReadData port will always return the value in memory at the supplied address, regardless of Write_En.

Again, here’s a quick summary of its inputs and outputs:

Signal Name Direction Bit Width Description
WriteAddr Input 32 Address to read/write to in Memory
WriteData Input 32 Value to be written to Memory
Write_En Input 4 The write mask for instructions that write to Memory and zero otherwise
CLK Input 1 Driven by the clock input to the CPU
ReadData Output 32 Value of the data stored at the specified address

Info: Control Status Registers (CSRs)

In order to run the testbenches that determine your project grades, there are a few more instructions that need to be added. A Control Status Register (CSR) holds additional information about the results of machine instructions, and it usually is stored independently of the register file and the memory. In your processor, you will be writing outputs to one of the CSRs that will be monitored by more complex testbenches.

Below are the 2 CSR instructions that you will need to implement. Note that while there are 2^12 possible CSR addresses, we only expect one of them to work (tohost = 0x51E). Writes to other CSR addresses should not affect the tohost CSR.

  1. csrw tohost, rs1 (short for csrrw x0, csr, rs1 where csr=tohost=0x51E)
  2. csrwi tohost, uimm (short for csrrwi x0, csr, uimm where csr=tohost=0x51E)

The instruction formats for these instructions are as follows:

Note that the immediate forms use a 5-bit zero-extended immediate (uimm) encoded in the rs1 field.

The Control Status Registers unit (located in csr.circ) is already fully implemented for you! Please do not edit anything in the circuit, including input/output pins, or the autograder tests may fail.

If you want to learn more about CSR, you can refer to Chapter 9 of the RISC-V specification.

Here’s a quick summary of its inputs and outputs:

Signal Name Direction Bit Width Description
CSR_address Input 12 Input CSR register address
CSR_din Input 32 Value to write into specified CSR register
CSR_WE Input 1 Write enable
clk Input 1 Clock input
tohost Output 32 Output of the tohost register

Info: Processor

The main CPU circuit (located in cpu.circ) implements the main datapath and connects all the subcircuits (ALU, Branch Comparator, Control Logic, Control Status Registers, Immediate Generator, Memory, and RegFile) together. After finishing this task, your CPU should be making use of all these components.

As a refresher, here’s a quick summary of its inputs and outputs:

Input Name Bit Width Description
READ_DATA 32 Driven with the data at the data memory address identified by the WRITE_ADDRESS (see below).
INSTRUCTION 32 Driven with the instruction at the instruction memory address identified by the FETCH_ADDRESS (see below).
CLOCK 1 The input for the clock. As with the register file, this can be sent into subcircuits (e.g. the CLK input for your register file) or attached directly to the clock inputs of memory units in Logisim, but should not otherwise be gated (i.e., do not invert it, do not AND it with anything, etc.).
Output Name Bit Width Description
ra 32 Driven with the contents of ra (FOR TESTING)
sp 32 Driven with the contents of sp (FOR TESTING)
t0 32 Driven with the contents of t0 (FOR TESTING)
t1 32 Driven with the contents of t1 (FOR TESTING)
t2 32 Driven with the contents of t2 (FOR TESTING)
s0 32 Driven with the contents of s0 (FOR TESTING)
s1 32 Driven with the contents of s1 (FOR TESTING)
a0 32 Driven with the contents of a0 (FOR TESTING)
tohost 32 Driven with the contents of CSR 0x51E (FOR TESTING, for Part A leave it as-is)
WRITE_ADDRESS 32 This output is used to select which address to read/write data from in data memory.
WRITE_DATA 32 This output is used to provide write data to data memory.
WRITE_ENABLE 4 This output is used to provide the write enable mask to data memory.
PROGRAM_COUNTER 32 This output is used to select which instruction is presented to the processor on the INSTRUCTION input.

We strongly recommend that you review the Single Cycle CPU: A Guide section from Part A.

Again, make sure that you do not edit the input/output pins or add new ones!


Notices

  • To prevent double jeopardy, the autograder will replace your ALU and RegFile with the staff ALU and RegFile. This means that if your ALU or RegFile from Part A have issues, they will not affect your autograder results in Part B. However, this also means that you must not depend on out-of-spec behavior from these circuits.

Single-Cycle CPU Testing

We’ve provided some basic sanity tests for your CPU in the tests/part-b/sanity/ directory. You can run these with:

$ python3 test.py tests/part-b/sanity/

Refer to the Info: CPU Testing section for more info on using these tests.

Note that these sanity tests are not comprehensive; they are intended to guide you in the early stages of testing until you start Task 5.


Task 5: Custom Tests

For Part B, we have provided a set of basic, visible unit tests in the autograder and starter code. These tests are meant to help reduce your stress by providing some guidance in the early stages of testing, but they are not comprehensive. You should still write rigorous tests for your designs, as passing the basic sanity tests does not guarantee that you will pass any of the hidden tests.

The autograder tests for Part B fall into 3 main categories: unit tests, integration tests, and edge case tests. We won’t be revealing all the autograder tests, but you should be able to re-create a very close approximation of them with the tools we provide in order to test your CPU.

Unit tests: a unit test exercises your datapath with a single instruction, to make sure that each individual instruction has been implemented and is working as expected. You should write a different unit test for every single instruction that you need to implement, and make sure that you test the spectrum of possibilities for that instruction thoroughly. For example, a unit test slt should contain cases where rs1 < rs2, rs1 > rs2, and where rs1 == rs2.

Integration tests: After you’ve passed your unit tests, move onto tests that use multiple functions in combination. Try out various simple RISC-V programs that run a single function; your CPU should be able to handle them, if working properly. Feel free to try to use riscv-gcc to compile C programs to RISC-V, but be aware of the limited instruction set we’re working with (you don’t have any ecall instructions, for example). We’d recommend that you instead try to write simple functions on your own based on what you’ve seen in labs, discussions, projects, and exams.

Edge case tests: edge case tests try inputs that you normally wouldn’t expect, which may trigger bugs in certain situations. What edge cases should you look for? A small example/hint from us: our 2 main classes of edge cases come from memory operations and branch/jump operations (two of the tests are mem-various-offsets, which tests memory instructions with various offsets, and br-jump-limits, which tests limits of branch/jump instructions). Think about other ways these operations could trigger potential bugs.

Creating Custom Tests

We’ve included a script (tools/create-test.py) that uses Venus to help you generate test circuits from RISC-V assembly! The process for writing custom tests is as follows:

  1. Come up with a test, and write the RISC-V assembly instructions for that test, saving them in a file ending in .s in the tests/part-b/custom/inputs/ folder. The name of this file will be the name of your test. Repeat if you have more tests.
  • e.g. tests/part-b/custom/inputs/sll-slli.s and tests/part-b/custom/inputs/beq.s
  1. Generate the test circuits for your tests using create-test.py:
  $ python3 tools/create-test.py tests/part-b/custom/inputs/sll-slli.s tests/part-b/custom/inputs/beq.s

Reminder: if you want to regenerate everything, you can take advantage of Bash globs:

  $ python3 tools/create-test.py tests/part-b/custom/inputs/*.s

This should generate a couple new files to go with your assembly file:

  tests/part-b/custom/:
    - cpu-<TEST_NAME>.circ                                # The new circuit for your test
    - inputs/<TEST_NAME>.s                                # The test file you wrote (unchanged)
    - reference-outputs/cpu-<TEST_NAME>-pipelined-ref.out # The pipelined reference output for your test
    - reference-outputs/cpu-<TEST_NAME>-ref.out           # The single-cycle reference output for your test
  1. Now you can run the tests you just wrote!
  $ python3 test.py tests/part-b/custom/sll-slli.circ tests/part-b/custom/beq.circ

Reminder: you can run all tests in a directory:

  $ python3 test.py tests/part-b/custom/

By default, the number of cycles for a test will be just enough for all instructions in the test, as well as extra cycles for the register writeback and pipelining. If you wish to override this and simulate your code for a certain number of cycles, you can use the --cycles flag:

$ python3 tools/create-test.py --cycles <NUMBER_OF_CYCLES> <ASM_PATH>

Refer to the Info: CPU Testing section for more info on using these tests.


Test Coverage

Test coverage: a metric measuring how much of a given codebase is being tested by tests. For the purposes of this project, you will be graded on how much of the required ISA your tests cover.

The autograder for Part B will examine the coverage of tests located in the tests/part-b/custom/inputs/ folder. When you submit Part B to the autograder, the autograder will output a message about the percentage coverage of your tests against our staff suite of tests and notify you if any of your tests raised a syntax error.

Coverage Hints

  • If you make many short test files rather than one large one, it will be easier to figure out which test and which line causes your Syntax Error (and to figure out where your CPU is failing). We’d recommend that for unit testing, where you have one .s file testing each instruction.
  • Make sure you test every single instruction in the ISA, including the ones that are covered by the sanity tests; feel free to use the sanity tests as a model or even incorporate them as part of your test suite.
  • Make sure you check that all registers are working.
  • Make sure you don’t have any “dummy” tests; if a test doesn’t lead to a change in state or register value, it is not a meaningful test. For example, loading the value 0 from memory into a register that already has the value of 0 doesn’t change the value in the register, so you can’t determine if the load instruction actually worked or not by looking at the test output.
  • Make sure that you accumulate outputs into the special registers we have debug outputs for, as other registers cannot be directly checked (reminder: ra, sp, t0, t1, t2, s0, s1, and a0).
  • Consider if an instruction uses a signed or unsigned immediate. How would you test for implementations using the wrong sign?

Notices

  • Most instructions use registers or memory. In order to test these instructions, we need to load different values into registers and memory, which we (currently) unavoidably have to do using addi, lui, and sw. This means that your CPU must handle addi and lui properly, or they may cause other instructions’ tests to fail. Additionally, failures in lw or sw may affect each other, since we (currently) cannot write to memory without sw or inspect a value in memory without lw. Make sure to test these instructions extensively!
  • Venus does not support CSRs, so you will not be able to generate custom tests for CSR-related instructions (e.g. csrw, csrwi). To compensate, the autograder has a visible robust CSR sanity check that will test all needed functionality of the CSR.
  • You can create custom tests with pseudoinstructions and run them locally, but the Test Coverage autograder does not currently support them, so pseudoinstructions won’t count toward your coverage statistics.
  • Avoid creating tests that use out-of-range memory addresses (2^16 and above) or memory addresses that aren’t valid for the load/store instruction (see the Memory section in Task 4). Venus supports these, but your CPU should not, so your CPU will not be able to pass these tests.

Task 6: Pipelining

So far, your CPU is capable of executing instructions in our ISA in a single cycle. Now, it’s time to implement pipelining in your CPU! For this project, you’ll need to implement a 2-stage pipeline, which is still conceptually similar to the 5-stage pipeline covered in lecture and discussion (review those if you need a refresher). The two stages you’ll implement are:

  1. Instruction Fetch: An instruction is fetched from the instruction memory.
  2. Execute: The instruction is decoded, executed, and committed (written back). This is a combination of the remaining four stages of a normal five-stage RISC-V pipeline (ID, EX, MEM and WB).

Because all of the control and execution is handled in the Execute stage, your processor should be more or less indistinguishable from a single-cycle implementation, barring the one-cycle startup latency. However, we will be enforcing the two-stage pipeline design. Some things to consider:

  • Will the IF and EX stages have the same or different PC values?
  • Do you need to store the PC between the pipelining stages?
  • What hazards are present in this two-stage pipeline?

You might also notice a bootstrapping problem here: during the first cycle, the instruction register sitting between the pipeline stages won’t contain an instruction loaded from memory. How do we deal with this? Luckily, Logisim automatically sets registers to zero on reset, so the instruction register will start with a nop! If you wish, you can depend on this behavior of Logisim.

Control Hazards

Since your CPU will support branch and jump instructions, you’ll need to handle control hazards that occur when branching.

The instruction immediately after a branch or jump is not executed if a branch is taken. This makes your task a bit more complex. By the time you have figured out that a branch or jump is in the execute stage, you have already accessed the instruction memory and pulled out (possibly) the wrong instruction. You will therefore need to “kill” instruction that is being fetched if the instruction under execution is a jump or a taken branch.

Instruction kills for this project MUST be accomplished by MUXing a nop into the instruction stream and sending the nop into the Execute stage instead of using the fetched instruction. You can use 0x00000013, or addi x0, x0, 0, for this purpose; other nop instructions will work too. You should kill if a branch is taken (do not kill otherwise). Do kill on every type of jump.

Note: do not solve this issue by calculating branch offsets in the IF stage. If we test your output against the reference every cycle, and the reference returns a nop, while it may be a conceptually correct solution, this will cause you to fail our tests.

Some more things to consider:

  • To MUX a nop into the instruction stream, do you place it before or after the instruction register?
  • What address should be requested next while the EX stage executes a nop? Is this different than normal?

Pipelined CPU Testing

We’ve provided some basic sanity tests for your pipelined CPU in the tests/part-b/sanity/ directory (same tests as in Task 4). You can run these with:

$ python3 test.py --pipelined tests/part-b/sanity/

Note: since your CPU is pipelined at this point, you need to run the pipelined tests using the --pipelined (or -p) flag. If you run the single-cycle tests (i.e. not using the --pipelined flag) at this point (after pipelining your CPU), your CPU should now fail those tests! Think about why this happens…

Similarly, you can also run the pipelined version of your custom tests:

$ python3 test.py --pipelined tests/part-b/custom/

Note: because you’re implementing a 2-stage pipelined processor and the first instruction writes on the rising edge of the second clock cycle, the effects of your instructions will have a 2 instruction delay. For example, let’s look at the first instruction of tests/part-b/sanity/inputs/addi.s, addi t0, x0, -1. If you inspect the pipelined reference output (tests/part-b/sanity/reference-output/cpu-addi-pipelined-ref.out), you’ll see that t0 doesn’t show changes until the third cycle.

Refer to the Info: CPU Testing section for more info on using these tests. Keep in mind that you’re working with a pipelined circuit from this task onward.


Task 7: Part B README Update

Time to update your README.md! Once again, write down how you implemented your circuits and components for this part (including the various subcircuits you used), and explain the reasoning behind your design choices. In particular, we want to see:

  • How you designed your control logic
  • Advantages/Disadvantages of your design
  • Best/Worst bug or design challenge you encountered, and your solution to it

Your additions to the README should be at least 512 characters (although something more than the bare minimum would be nice), but other than that feel free to get creative!


Part B: Submission

At this point, if you’ve completed tasks 4-7, you’ve finished Part B of the project. Congratulations on your shiny new CPU!

Double-check and make sure that that:

  • You have not moved any provided input/output pins, and that your circuits fit in the provided testing harnesses
  • You did not create any additional .circ files; the autograder will only be testing the circuit files you were allowed to edit for Part B (branch-comp.circ, control-logic.circ, cpu.circ, imm-gen.circ).
  • Your custom .s tests are located in tests/part-b/custom/inputs/, since the autograder will test those for coverage.
  • You have completed Task 7 (README update)

To prevent double jeopardy, the autograder will replace your ALU and RegFile with the staff ALU and RegFile. This means that if your ALU or RegFile from Part A have issues, they will not affect your autograder results in Part B. However, this also means that you must not depend on out-of-spec behavior from these circuits.

The autograder for Part B uses the sanity tests provided in the starter code, as well as hidden unit, integration, and edge case tests as specified in Task 5. Additionally, the autograder will check your custom tests for test coverage.

Note: If you fail on any of the provided autograder sanity tests, course staff will not help you debug your CPU unless you have recreated the failure in a custom test.


Grading

The grading breakdown for Project 3 is as follows:

  • Part A (20%)
    • ALU (7%)
    • RegFile (8%)
    • addi (5%)
  • Part B (80%)
    • Sanity and Visible (Basic) Unit Tests (20%)
    • Test Coverage (10%)
    • Hidden Unit, Integration, and Edge Case Tests (49%)
    • README (1%)