Part A Deadline: Thursday, March 11, 2021
Part B Deadline: Friday, April 2, 2021
You’re probably curious about that “Sea Pea You” thing in your computer (if you’re not, let’s pretend you are for a second). How exactly does a CPU get electricity to run those sw ra, 40(sp)
instructions you’ve been writing? It’s time to uncover another piece of that black box: welcome to Project 3!
In Part A (Tasks 1-3), you’ll be wiring up the ALU and RegFile for a basic RISC-V CPU, as well as implementing the CPU datapath for executing addi
instructions. In Part B (Tasks 4-5), you’ll use these components (and others) to wire up a working CPU that runs actual RISC-V instructions!
Wiring
(except Transistor
, Transmission Gate
, POR
, Pull Resistor
, Power
, Ground
, POR
, Do not connect
), Gates
(except PLA
), Plexers
, Arithmetic
(except Divider
), Memory
(except RAM
, Random Generator
)..circ
files use the XML format, which makes it hard for Git to automerge. We recommend working on a single computer at a time; if you use multiple computers, make sure that you have pushed and pulled your code before switching devices.cpu.circ
, to make sure your circuits fit in the testing harnesses. You can add pins in subcircuits you create, but avoid making new pins in circuits with the provided, locked pins..circ
files given in the cpu/
folder in the starter code. You may not make new .circ
files; the autograder will fail you if you do this!.circ
files. Failing to do this will result in reduced autograder points. You may not change the names of any circuits provided in the starter files.Help -> Library Reference
within Logisim.Some common sources of Logisim errors, for your debugging convenience:
Please visit https://galloc.cs61c.org/ and get your proj3
repository. Then, clone your repository locally and add the starter remote:
$ git clone YOUR_REPO_URL
$ cd YOUR_REPO_NAME
$ git remote add starter https://github.com/61c-teach/sp21-proj3-starter.git
$ git pull starter master
If we make changes to the starter code, you can update your repository with git pull starter master
.
This project primarily uses Logisim; refer to Lab 5 if you haven’t set it up or need a refresher.
Your first task is to create an ALU that supports all the operations needed by the instructions in our ISA (which is described in further detail in the next section). Please note that we treat overflow as RISC-V does with unsigned instructions, meaning that we ignore overflow.
We have provided a skeleton of an ALU for you in alu.circ
(in the cpu
folder). It has three inputs:
Input Name | Bit Width | Description |
---|---|---|
A | 32 | Data to use for Input A in the ALU operation |
B | 32 | Data to use for Input B in the ALU operation |
ALUSel | 4 | Selects which operation the ALU should perform (see the list of operations with corresponding switch values below) |
… and one output:
Output Name | Bit Width | Description |
---|---|---|
Result | 32 | Result of the ALU operation |
Below is the list of ALU operations for you to implement, along with their associated ALUSel values. All of them are required. You are allowed and encouraged to use built-in Logisim components to implement the arithmetic operations.
ALUSel Value | Instruction |
---|---|
0 | add: Result = A + B |
1 | sll: Result = A << B |
2 | slt: Result = (A < B (signed)) ? 1 : 0 |
3 | Unused |
4 | xor: Result = A ^ B |
5 | srl: Result = (unsigned) A >> B |
6 | or: Result = A | B |
7 | and: Result = A & B |
8 | mul: Result = (signed) (A * B)[31:0] |
9 | mulh: Result = (signed) (A * B)[63:32] |
10 | Unused |
11 | mulhu: Result = (A * B)[63:32] |
12 | sub: Result = A - B |
13 | sra: Result = (signed) A >> B |
14 | Unused |
15 | bsel: Result = B |
The ALU tests for Part A only use ALUSel values with defined instructions, so your design doesn’t need to worry about the unused values.
You can make any modifications to alu.circ
you want, but the behavior must match the specification above. If you create additional subcircuits for your ALU, they must also be in alu.circ
(you may not make new .circ
files). Additionally, your ALU must be able to fit in the provided harness alu-harness.circ
. This means that you should take care not to edit the provided input/output pins or add new ones. To verify that changes you made didn’t break anything, you can open alu-harness.circ
and ensure there are no errors and that the circuit functions well.
add
is already made for you; feel free to use a similar structure when implementing your other operations.Help -> Library Reference
for more information on the component and its inputs and outputs.Multiplier
component has a Carry Out
output, which might be useful for multiply operations.We’ve provided some sanity tests for each task, located in subdirectories under tests/
. For example, the ALU tests are in tests/part-a/alu/
. When tests are run, the outputs from your circuits are saved in a student-output/
subdirectory.
For example, to run the ALU tests:
$ python3 test.py tests/part-a/alu/
You can also specify a single test circuit, or a grandparent/great-grandparent directory:
$ python3 test.py tests/part-a/alu/alu-add.circ
$ python3 test.py tests/part-a/
$ python3 test.py tests/
After the tests finish running, your ALU circuit’s outputs will be saved under tests/part-a/alu/student-output/
with a -student.out
suffix (e.g. alu-add-student.out
). The corresponding reference outputs can be found at tests/part-a/alu/reference-output/
with a -ref.out
suffix (e.g. alu-add-ref.out
).
We’ve also provided format-output.py
, which accepts a path to an output file and displays the output in a more readable format (left-aligned hexadecimal numbers). For example, to get the reference output of the alu-add
sanity test in readable format, you would do:
$ python3 tools/format-output.py tests/part-a/alu/reference-output/alu-add-ref.out
If you want to see the difference between your output and the reference solution, put the readable outputs into temporary files and diff
them. For example, for the alu-add
test, you would do:
$ python3 tools/format-output.py tests/part-a/alu/reference-output/alu-add-ref.out > reference.out
$ python3 tools/format-output.py tests/part-a/alu/student-output/alu-add-student.out > student.out
$ diff reference.out student.out
Note: if the lines are wrapping, try resizing your terminal window (or try a slightly smaller font). Or see the following note.
Experimental note: each output file is technically a valid CSV file, so you can also import the output in a spreadsheet app if you really want to crunch the numbers (or you really hate tables in terminal). If the app requires a .csv
extension, you can cp tests/part-a/alu/student-output/alu-add-student.out student.csv
and import the resulting .csv
file.
Similar to how you can step through your C code in GDB, you can also step through the test circuits in Logisim! For this example we’ll be using the alu-add
test.
Open tests/part-a/alu/alu-add.circ
in Logisim. There are, among other things, some ROMs feeding into the Input_A
, Input_B
, and ALUSel
tunnels, which then feed into your ALU near the upper right. Every clock cycle, the adder on the top left increments by one, which advances the output from the ROMs by one entry and feeds a new set of inputs to your ALU. If you tick the circuit a couple times (File -> Tick Full Cycle
or the corresponding keyboard shortcut), you can see the test circuit advance through each set of inputs and your ALU’s corresponding outputs. If you want to start over, use Simulate -> Reset Simulator
(or the keyboard shortcut).
Now, let’s see what your ALU is actually doing with the inputs. Right-click your ALU, and select View alu
. Your ALU circuit will appear, with the input values for the current test cycle already on the ALU input pins. With this, you can see exactly what your ALU is doing in every line from the output files! The Poke Tool
will be very useful here.
Note: edits to the test circuit, including the ALU we just inspected, will not be saved. Avoid making edits in the test circuit, as they may be lost!
As you learned in class, RISC-V architecture has 32 registers. For this project, we will implement all of them. To aid in debugging and testing, we have written the RegFile to expose the 8 registers specified below. Please make sure that the values of these register are attached to the proper outputs.
Your RegFile should be able to write to or read from these registers specified in a given RISC-V instruction without affecting any other registers. There is one notable exception: your RegFile should NOT write to x0, even if an instruction tries. Remember that the zero register should ALWAYS have the value 0x0. You should NOT gate the clock at any point in your RegFile: the clock signal should ALWAYS connect directly to the clock input of the registers without passing through ANY combinational logic.
The exposed registers and their corresponding numbers are listed below.
Register Number | Register Name |
---|---|
x1 | ra |
x2 | sp |
x5 | t0 |
x6 | t1 |
x7 | t2 |
x8 | s0 |
x9 | s1 |
x10 | a0 |
You are provided with the skeleton of a register file in regfile.circ
. The register file circuit has six inputs:
Input Name | Bit Width | Description |
---|---|---|
Clock | 1 | Input providing the clock. This signal can be sent into subcircuits or attached directly to the clock inputs of memory units in Logisim, but should not otherwise be gated (i.e., do not invert it, do not AND it with anything, etc.). |
RegWEn | 1 | Determines whether data is written to the register file on the next rising edge of the clock. |
rs1 (Source Register 1) | 5 | Determines which register’s value is sent to the Read_Data_1 output, see below. |
rs2 (Source Register 2) | 5 | Determines which register’s value is sent to the Read_Data_2 output, see below. |
rd (Destination Register) | 5 | Determines which register to write the value of Write Data to on the next rising edge of the clock, assuming that RegWEn is a 1. |
wb (Write Data) | 32 | Determines what data to write to the register identified by the Destination Register input on the next rising edge of the clock, assuming that RegWEn is 1. |
The register file also has the following outputs:
Output Name | Bit Width | Description |
---|---|---|
Read_Data_1 | 32 | Driven with the value of the register identified by the Source Register 1 input. |
Read_Data_2 | 32 | Driven with the value of the register identified by the Source Register 2 input. |
ra Value |
32 | Always driven with the value of ra (This is a DEBUG/TEST output.) |
sp Value |
32 | Always driven with the value of sp (This is a DEBUG/TEST output.) |
t0 Value |
32 | Always driven with the value of t0 (This is a DEBUG/TEST output.) |
t1 Value |
32 | Always driven with the value of t1 (This is a DEBUG/TEST output.) |
t2 Value |
32 | Always driven with the value of t2 (This is a DEBUG/TEST output.) |
s0 Value |
32 | Always driven with the value of s0 (This is a DEBUG/TEST output.) |
s1 Value |
32 | Always driven with the value of s1 (This is a DEBUG/TEST output.) |
a0 Value |
32 | Always driven with the value of a0 (This is a DEBUG/TEST output.) |
The test outputs at the top of your regfile
circuit are present for testing and debugging purposes. If you were implementing a real register file, you would omit those outputs. In our case, be sure they are included correctly – if they are not, you will not pass tests.
You can make any modifications to regfile.circ
you want, but the behavior must match the specification above. If you create additional subcircuits to use in your RegFile, they must also be in regfile.circ
(you may not make new .circ
files). Additionally, your RegFile must be able to fit in the provided harness regfile-harness.circ
. This means that you should take care not to edit the provided input/output pins or add new ones. To verify that changes you made didn’t break anything, you can open regfile-harness.circ
and ensure there are no errors and that the circuit functions well.
Enable
input on your MUXes. In fact, you can turn that attribute off (Include Enable?
). We also recommend that you disable the Three-state?
attribute (if the plexer has it).Register
component, and check out all the input/output pins. The Enable
pin may come in handy.x0
?We’ve provided some RegFile sanity tests in the tests/part-a/regfile/
directory. You can run these with:
$ python3 test.py tests/part-a/regfile/
Refer to the Info: Testing section for more info on test outputs.
addi
InstructionAs your final task for Part A, you’re going to implement a CPU that’s capable of executing one instruction: addi
!
Note: we’ll be implementing other instructions in Part B. You’re welcome to implement other instructions at this time, but you’ll only be graded on whether or not addi
executes correctly for Part A, so make sure that addi
works!
The Memory unit (located in mem.circ
) is already fully implemented for you and attached to the outputs of your CPU in cpu-harness.circ
! The addi
instruction does NOT use Data Memory, so for Part A you can ignore the DMEM and leave its I/O pins undriven.
If you are interested, here’s a quick summary of its inputs and outputs:
Signal Name | Direction | Bit Width | Description |
---|---|---|---|
WriteAddr | Input | 32 | Address to read/write to in Memory |
WriteData | Input | 32 | Value to be written to Memory |
Write_En | Input | 4 | The write mask for instructions that write to Memory and zero otherwise |
CLK | Input | 1 | Driven by the clock input to the CPU |
ReadData | Output | 32 | Value of the data stored at the specified address |
The Branch Comparator unit (located in branch-comp.circ
) provided in the skeleton is unimplemented, but the addi
instruction does NOT use the Branch Comparator unit, so you don’t have to worry about it for Part A.
If you are interested, here’s a quick summary of its inputs and outputs:
Signal Name | Direction | Bit Width | Description |
---|---|---|---|
rs1 | Input | 32 | Value in the first register to be compared |
rs2 | Input | 32 | Value in the second register to be compared |
BrUn | Input | 1 | Equal to one when an unsigned comparison is wanted, or zero when a signed comparison is wanted |
BrEq | Output | 1 | Equal to one if the two values are equal |
BrLt | Output | 1 | Equal to one if the value in rs1 is less than the value in rs2 |
The Immediate Generator (“Imm Gen”) unit (located in imm-gen.circ
) provided in the skeleton is unimplemented. The addi
instruction requires an immediate generator, but for now you can hard-wire it to construct the immediate for the addi
instruction, without worrying about other immediate types.
To edit this subcircuit, edit the imm-gen.circ
file and not the imm_gen
in cpu.circ
. Note that if you modify this circuit, you will need to close and re-open cpu.circ
to load the changes in your CPU.
Here’s a quick summary of its inputs and outputs:
Signal Name | Direction | Bit Width | Description |
---|---|---|---|
inst | Input | 32 | The instruction being executed |
ImmSel | Input | 3 | Value determining how to reconstruct the immediate |
imm | Output | 32 | Value of the immediate in the instruction |
We have provided a skeleton for your processor in cpu.circ
. You will be using your own implementations of the ALU and RegFile as you construct your datapath. You are responsible for constructing the entire datapath from scratch. For Part A, your completed processor should support executing the addi
instruction in a single cycle (i.e. no pipelining). In Part B, we’ll modify your CPU to use a 2-stage pipeline, with IF in the first stage and ID, EX, MEM, and WB in the second stage.
Your processor will sit in a processor harness cpu-harness.circ
that contains the Memory unit. That processor harness then sits in a testing harness run.circ
that provides the instructions to the processor. Your process will output the address of an instruction, and accept the instruction at that address as an input. It will also output the data memory address, data memory write enable, and accept the data at that address as an input. Essentially, these two test harnesses are your data memory and instruction respectively. We recommend that you take some time to inspect cpu-harness.circ
and run.circ
to see exactly what’s going on. cpu-harness.circ
will be used in the tests provided to you for sanity checking, so make sure your CPU fits in the harness before testing and submitting your work! Your processor has 3 inputs that come from the harness:
Input Name | Bit Width | Description |
---|---|---|
READ_DATA | 32 | Driven with the data at the data memory address identified by the WRITE_ADDRESS (see below). |
INSTRUCTION | 32 | Driven with the instruction at the instruction memory address identified by the FETCH_ADDRESS (see below). |
CLOCK | 1 | The input for the clock. As with the register file, this can be sent into subcircuits (e.g. the CLK input for your register file) or attached directly to the clock inputs of memory units in Logisim, but should not otherwise be gated (i.e., do not invert it, do not AND it with anything, etc.). |
Your processor must provide the following outputs to the first level harness:
Output Name | Bit Width | Description |
---|---|---|
ra | 32 | Driven with the contents of ra (FOR TESTING) |
sp | 32 | Driven with the contents of sp (FOR TESTING) |
t0 | 32 | Driven with the contents of t0 (FOR TESTING) |
t1 | 32 | Driven with the contents of t1 (FOR TESTING) |
t2 | 32 | Driven with the contents of t2 (FOR TESTING) |
s0 | 32 | Driven with the contents of s0 (FOR TESTING) |
s1 | 32 | Driven with the contents of s1 (FOR TESTING) |
a0 | 32 | Driven with the contents of a0 (FOR TESTING) |
tohost | 32 | Driven with the contents of CSR 0x51E (FOR TESTING, for Part A leave it as-is) |
WRITE_ADDRESS | 32 | This output is used to select which address to read/write data from in data memory. |
WRITE_DATA | 32 | This output is used to provide write data to data memory. |
WRITE_ENABLE | 4 | This output is used to provide the write enable mask to data memory. |
PROGRAM_COUNTER | 32 | This output is used to select which instruction is presented to the processor on the INSTRUCTION input. |
Just like with the ALU and RegFile, make sure that you do not edit the input/output pins or add new ones!
The Control Logic unit (control-logic.circ
) provided in the skeleton is unimplemented. Designing your control logic unit will probably be your biggest challenge in Part B. For Part A, you can put a constant for each control signal, because addi
is the only instruction you’ll be implementing. As you implement addi
, think about where you’ll need to make additions in order to support other instructions.
To edit this subcircuit, edit the control-logic.circ
file and not the control_logic
in cpu.circ
. Note that if you modify this circuit, you will need to close and re-open cpu.circ
to load the changes in your CPU.
You are welcome to add more input/output pins to the starter control logic as your design demands. You may also use as many or as few of the supplied ports as needed. However, please do not edit any of the existing pins during this process.
We know that trying to build a CPU with a blank slate might be intimidating, so we wrote the following guide to help you.
Recall the five stages of the CPU pipeline:
This guide will help you work through each of these stages for the addi
instruction. Each section will contain questions for you to think through and pointers to important details, but it won’t tell you exactly how to implement the instruction.
You may need to read and understand each question before going to the next one, and you can see the answers by clicking on the question. During your implementation, feel free to place things in subcircuits as you see fit.
The main thing we are concerned about in this stage is: how do we get the current instruction? From lecture, we know that instructions are stored in the instruction memory, and each of these instructions can be accessed through an address.
In cpu.circ
, we have provided a simple PC register implementation - ignoring jumps and branches. You will implement branches and jumps in Part B of the project, but for now we are only concerned with being able to run addi
instructions.
Remember that we will eventually implement a 2-stage pipelined processor, so the IF stage is separate from the remaining stages. What circuitry separates the different stages of a pipeline? Specifically, what circuitry separates IF from the next stage? Will you need to add anything?
Now that we have our instruction coming from the instruction
input, we break it down in the Instruction Decode step according to the RISC-V instruction formats you have learned.
3. Implement the instruction field decode stage using the instruction input. You should use tunnels to label and group the bits.
5. Implement reading from the register file. You will have to bring in your RegFile from Part A. Remember to connect the clock!
The Execute stage is where the computation of most instructions is performed. This is also where we will introduce the idea of using a Control Module.
4. Bring in your ALU and connect the ALU inputs correctly. Do you need to connect the clock? Why or why not?
The memory stage is where the memory can be written to using store instructions and read from using load instructions. Because the addi
instruction does not use memory, we will not spend too much time here.
At this point, we cannot connect most of the inputs, as we don’t know where they should come from.
The write back stage is where the results of the operation is saved back to the registers.
2. Let's create the write back phase so that it is able to write both ALU and MEM outputs to the Register File. Later, when you implement branching/jumping, you may need to add more to this mux. However, at the moment, we need to choose between the ALU and MEM outputs, as only one wire can end up being an input to the register file. Bring a wire from both the ALU and `READ_DATA`, and connect it to a MUX.
5. There are two more inputs on the Register File which are important for writing data: RegWEn and rd. One of these will come from the Instruction Decode stage and the other one will be a new control signal that you need to design for Part B. Please finish off the Writeback stage by these inputs on the RegFile correctly.
In the Info: Testing section, we got to know the general folder structure of the tests, and understand the commands involved in running tests and interpreting output. Now, let’s look deeper into the CPU tests that you’ll be working with for the remainder of the project.
Each CPU test is a copy of the run.circ
file included with the starter code that has instructions loaded into its IMEM. When you run Logisim from the command line, the clock ticks, the program counter is incremented, and the values in each of the outputs is printed to stdout
.
Let’s take as the single-cycle cpu-addi-basic
sanity test as an example. It has 4 addi
instructions (see tests/part-a/addi/cpu-addi-basic.s
). Open tests/part-a/addi/cpu-addi-basic.circ
in Logisim, and take a closer look at the various parts of the test file. At the top, you’ll see the place where your CPU is connected to the test outputs. With the starter code, you’ll see lots of UUUU
s or XXXX
s; when your CPU is working, this should not be the case. Your CPU takes in one input (INSTRUCTION
), and along with the values in each of the registers, it has an additional output: PROGRAM_COUNTER
, or the address of the instruction to be fetched from IMEM to be executed the next clock cycle.
As you can see, there are many specifically-positioned wires connected to specific input/output pins on your CPU. Make sure that you do not edit the provided input/output pins or add new ones, as this will change the shape of the CPU circuit, and as a result the connections in the test files may no longer work properly.
Below the CPU, you’ll see instruction memory. The hex for the addi
instructions has been loaded into instruction memory. Instruction memory takes in one input (called PROGRAM_COUNTER
) and outputs the instruction at that address. PROGRAM_COUNTER
is a 32-bit value, but because Logisim caps the size of ROM units at 2^16 bytes, we have to use a splitter to get only 14 bits from PROGRAM_COUNTER
(ignoring the bottommost two bits). Notice that PROGRAM_COUNTER
is a byte address, not a word address.
So what happens when the clock ticks? Each tick of the clock increments an input in the test file called Time_Step
. The clock will continue to tick until Time_Step
is equal to the halting constant for that test file (for this particular test file, the halting constant is 5). At that point, the Logisim command line will print the values in each of the outputs to stdout
. Our tests will compare this output to the expected; if your output is different, you will fail the test.
addi
Tests (Part A)We’ve provided some sanity tests for addi
(Task 3) in the tests/part-a/addi/
directory. You can run these with:
$ python3 test.py tests/part-a/addi/
See above for more info on working with these tests.
Your last task for Part A is to fill in the README.md
. Write down how you implemented your circuits and components for this part (including ALU and RegFile, since you used them for addi
!), and explain the reasoning behind your design choices. There’s no specific format or length requirement here, so feel free to get creative!
At this point, if you’ve completed tasks 1-3.5, you’ve finished Part A of the project!
The autograder for Part A uses the same tests as the test files provided in the starter code. In other words, there are no hidden tests for Part A.
For addi
tests, the autograder accepts either a single-cycle or a pipelined CPU. This is for cases where you start work in Part B, pipeline your CPU, and then realize you want to resubmit to Part A.
Double-check that you have not edited your input/output pins, and that your circuits fit in the provided testing harnesses. Make sure that you did not create any additional .circ
files; the autograder will only be testing the circuit files you were allowed to edit for Part A (alu.circ
, branch-comp.circ
, control-logic.circ
, cpu.circ
, imm-gen.circ
, and regfile.circ
). Then, submit your repo to the Project 3A
assignment on Gradescope.
The rest of this spec describes the tasks for Part B.
Please pull the latest updates from the starter code.
In Task 3, you wired up a basic single-cycle CPU capable of executing addi
instructions. Now, you’ll implement support for more instructions!
Your CPU will end up with many parts – try breaking it down into chunks, and start with the simple ones first. How you approach this task is entirely up to you, but one suggested starting order is: branch comparator and immediate generator (simpler than control logic/datapath), then I-type calculation instructions (since you’ve already implemented addi), then R-type calculation instructions (since there’s some overlap with I-type calculation instructions), and so on.
We strongly recommend that you work on Task 4 and Task 5 (Custom Tests) together, since incremental testing will help you catch bugs much faster. Write tests for an instruction (or group of instructions) you plan to implement, then implement the instruction(s) while using your tests as a reference. Once your tests pass, commit your changes so you can come back later if you find a regression, and move on to another instruction.
We will be grading your CPU implementation on only the instructions listed below. Your CPU must support these instructions, but feel free to implement any additional instructions you want as long as they don’t affect your implementation of the required instructions.
Instruction | Type | Opcode | Funct3 | Funct7/Immediate | Operation |
add rd, rs1, rs2 | R | 0x33 | 0x0 | 0x00 | R[rd] ← R[rs1] + R[rs2] |
mul rd, rs1, rs2 | 0x0 | 0x01 | R[rd] ← (R[rs1] * R[rs2])[31:0] | ||
sub rd, rs1, rs2 | 0x0 | 0x20 | R[rd] ← R[rs1] - R[rs2] | ||
sll rd, rs1, rs2 | 0x1 | 0x00 | R[rd] ← R[rs1] << R[rs2] | ||
mulh rd, rs1, rs2 | 0x1 | 0x01 | R[rd] ← (R[rs1] * R [rs2])[63:32] | ||
mulhu rd, rs1, rs2 | 0x3 | 0x01 | (unsigned) R[rd] ← (R[rs1] * R[rs2])[63:32] | ||
slt rd, rs1, rs2 | 0x2 | 0x00 | R[rd] ← (R[rs1] < R[rs2]) ? 1 : 0 (signed) | ||
xor rd, rs1, rs2 | 0x4 | 0x00 | R[rd] ← R[rs1] ^ R[rs2] | ||
srl rd, rs1, rs2 | 0x5 | 0x00 | (unsigned) R[rd] ← R[rs1] >> R[rs2] | ||
sra rd, rs1, rs2 | 0x5 | 0x20 | (signed) R[rd] ← R[rs1] >> R[rs2] | ||
or rd, rs1, rs2 | 0x6 | 0x00 | R[rd] ← R[rs1] | R[rs2] | ||
and rd, rs1, rs2 | 0x7 | 0x00 | R[rd] ← R[rs1] & R[rs2] | ||
lb rd, offset(rs1) | I | 0x03 | 0x0 | R[rd] ← SignExt(Mem(R[rs1] + offset, byte)) | |
lh rd, offset(rs1) | 0x1 | R[rd] ← SignExt(Mem(R[rs1] + offset, half)) | |||
lw rd, offset(rs1) | 0x2 | R[rd] ← Mem(R[rs1] + offset, word) | |||
addi rd, rs1, imm | 0x13 | 0x0 | R[rd] ← R[rs1] + imm | ||
slli rd, rs1, imm | 0x1 | 0x00 | R[rd] ← R[rs1] << imm | ||
slti rd, rs1, imm | 0x2 | R[rd] ← (R[rs1] < imm) ? 1 : 0 | |||
xori rd, rs1, imm | 0x4 | R[rd] ← R[rs1] ^ imm | |||
srli rd, rs1, imm | 0x5 | 0x00 | R[rd] ← R[rs1] >> imm | ||
srai rd, rs1, imm | 0x5 | 0x20 | R[rd] ← R[rs1] >> imm | ||
ori rd, rs1, imm | 0x6 | R[rd] ← R[rs1] | imm | |||
andi rd, rs1, imm | 0x7 | R[rd] ← R[rs1] & imm | |||
sb rs2, offset(rs1) | S | 0x23 | 0x0 | Mem(R[rs1] + offset) ← R[rs2][7:0] | |
sh rs2, offset(rs1) | 0x1 | Mem(R[rs1] + offset) ← R[rs2][15:0] | |||
sw rs2, offset(rs1) | 0x2 | Mem(R[rs1] + offset) ← R[rs2] | |||
beq rs1, rs2, offset | SB | 0x63 | 0x0 |
if(R[rs1] == R[rs2])
PC ← PC + {offset, 1b0} |
|
bne rs1, rs2, offset | 0x1 |
if(R[rs1] != R[rs2])
PC ← PC + {offset, 1b0} |
|||
blt rs1, rs2, offset | 0x4 |
if(R[rs1] < R[rs2] (signed))
PC ← PC + {offset, 1b0} |
|||
bge rs1, rs2, offset | 0x5 |
if(R[rs1] >= R[rs2] (signed))
PC ← PC + {offset, 1b0} |
|||
bltu rs1, rs2, offset | 0x6 |
if(R[rs1] < R[rs2] (unsigned))
PC ← PC + {offset, 1b0} |
|||
bgeu rs1, rs2, offset | 0x7 |
if(R[rs1] >= R[rs2] (unsigned))
PC ← PC + {offset, 1b0} |
|||
auipc rd, offset | U | 0x17 | R[rd] ← PC + {offset, 12b0} | ||
lui rd, offset | 0x37 | R[rd] ← {offset, 12b0} | |||
jal rd, imm | UJ | 0x6f |
R[rd] ← PC + 4
PC ← PC + {imm, 1b0} |
||
jalr rd, rs1, imm | I | 0x67 | 0x0 |
R[rd] ← PC + 4
PC ← R[rs1] + {imm} |
|
csrw rd, csr, rs1 | I | 0x73 | 0x1 | CSR[csr] ← R[rs1] | |
csrwi rd, csr, uimm | 0x5 | CSR[csr] ← {uimm} |
The Branch Comparator unit (located in branch-comp.circ
) compares two values and outputs control signals that will be used to make branching decisions. You will need to implement logic for this circuit.
To edit this subcircuit, edit the branch-comp.circ
file and not the branch_comp
in cpu.circ
. Note that if you modify this circuit, you will need to close and re-open cpu.circ
to load the changes in your CPU.
Again, here’s a quick summary of its inputs and outputs:
Signal Name | Direction | Bit Width | Description |
---|---|---|---|
rs1 | Input | 32 | Value in the first register to be compared |
rs2 | Input | 32 | Value in the second register to be compared |
BrUn | Input | 1 | Equal to one when an unsigned comparison is wanted, or zero when a signed comparison is wanted |
BrEq | Output | 1 | Equal to one if the two values are equal |
BrLt | Output | 1 | Equal to one if the value in rs1 is less than the value in rs2 |
The Immediate Generator (“Imm Gen”) unit (located in imm-gen.circ
) extracts the appropriate immediate from I
, S
, B
, U
, and J
type instructions. Remember that in RISC-V, all immediates that leave the immediate generator are 32-bits and sign-extended! See the table below for how each immediate should be formatted:
We don’t specify an encoding for ImmSel (i.e. come up with your own!), but make sure that your Immediate Generator and Control Logic use the same values for ImmSel.
To edit this subcircuit, edit the imm-gen.circ
file and not the imm_gen
in cpu.circ
. Note that if you modify this circuit, you will need to close and open cpu.circ
to load the changes in your CPU.
Again, here’s a quick summary of its inputs and outputs:
Signal Name | Direction | Bit Width | Description |
---|---|---|---|
inst | Input | 32 | The instruction being executed |
ImmSel | Input | 3 | Value determining how to reconstruct the immediate |
imm | Output | 32 | Value of the immediate in the instruction |
The Control Logic unit (control-logic.circ
) provided in the skeleton is based on the control logic unit in the 5-stage CPU used in lecture and discussion. In order to correctly identify each instruction, control signals play a very important part in this project. However, figuring out all of the control signals may seem intimidating. We suggest taking a look at the lecture slides and discussion worksheets to get started. Try walking through the datapath with different types of instructions; when you see a MUX or other component, think about what selector/enable value you will need for that instruction.
You are welcome to add more inputs or outputs to the existing starter control_logic
circuit as your control logic demands. You may also use as many or as few of the supplied ports as needed. That being said, please do not edit, move, or remove any of the existing ports during this process.
There are a two major approaches to implementing the control logic so that it can extract the opcode
/funct3
/funct7
from an instruction and set the control signals appropriately.
The recommended method is hard-wired control, as discussed in lecture, which is usually the preferred approach for RISC architectures like MIPS and RISC-V. Hard-wired control uses various gates and other components (remember, we’ve learned how components like MUXes can be built from AND/OR/NOT gates) to produce the appropriate control signals. An instruction decoder takes in an instruction and outputs all of the control signals for that instruction.
The other way to do it is to use ROM control. Every instruction implemented by a processor maps to an address in a Read-Only Memory (ROM) unit. At that address in the ROM is the control word for that instruction. An address decoder takes in an instruction and outputs the address of the control word for that instruction. This approach is common in CISC architectures like Intel’s x86-64, and, in real life, offers some flexibility because it can be re-programmed by changing the contents of the ROM.
To edit this subcircuit, edit the control-logic.circ
file and not the control_logic
in cpu.circ
. Note that if you modify this circuit, you will need to close and open cpu.circ
to load the changes in your CPU.
The Memory unit (located in mem.circ
) is already fully implemented for you and attached to the outputs of your CPU in cpu-harness.circ
! You must not add mem.circ
into your CPU; doing so will cause the autograder to fail and you will not receive a score.
Due to Logisim limitations, only the lower 16 bits of the memory address are used, and the upper 16 bits are discarded. Therefore, memory (IMEM and DMEM) in this project has an effective address space of 2^16 byte addresses.
While the address you give to memory is a byte address, the memory unit returns an entire word. The memory unit ignores the bottom two bits of the address you provide to it, and treats its input as a word address rather than a byte address. For example, if you input the 32-bit address 0x0000_1007, it wil be treated as the word address 0x0000_1004, and the output will be the 4 bytes at addresses 0x0000_1004, 0x0000_1005, 0x0000_1006, and 0x0000_1007.
Note that for the lh
, lw
, sh
, sw
instructions, the RISC-V ISA supports unaligned accesses, but implementing them is complicated. We’ll only be implementing aligned memory accesses in this project. This means that operations will only be defined when they do not exceed the boundaries of a contiguous word in memory. An example of such an operation is any lw
or sw
that operates on an address that is a multiple of 4. Since the address is a multiple of 4 and we load 4 bytes in a word, the total memory fetched does not exceed the boundaries of a contiguous word in memory. You must not implement unaligned accesses; you would likely need to use stalling, which would result in your output not matching our expected output (bad for your score).
Remember that the memory is also byte level write enabled. This means that the Write_En signal is 4 bits wide and acts as a write mask for the input data (i.e. each bit of the mask enables writing to the corresponding byte of the word). Some examples (W = byte will be overwritten, blank = byte is unaffected):
Write_En | Byte 0 | Byte 1 | Byte 2 | Byte 3 (most significant) |
---|---|---|---|---|
0b1000 | W | |||
0b0101 | W | W | ||
0b0000 | ||||
0b1111 | W | W | W | W |
The ReadData port will always return the value in memory at the supplied address, regardless of Write_En.
Again, here’s a quick summary of its inputs and outputs:
Signal Name | Direction | Bit Width | Description |
---|---|---|---|
WriteAddr | Input | 32 | Address to read/write to in Memory |
WriteData | Input | 32 | Value to be written to Memory |
Write_En | Input | 4 | The write mask for instructions that write to Memory and zero otherwise |
CLK | Input | 1 | Driven by the clock input to the CPU |
ReadData | Output | 32 | Value of the data stored at the specified address |
In order to run the testbenches that determine your project grades, there are a few more instructions that need to be added. A Control Status Register (CSR) holds additional information about the results of machine instructions, and it usually is stored independently of the register file and the memory. In your processor, you will be writing outputs to one of the CSRs that will be monitored by more complex testbenches.
Below are the 2 CSR instructions that you will need to implement. Note that while there are 2^12
possible CSR addresses, we only expect one of them to work (tohost = 0x51E
). Writes to other CSR addresses should not affect the tohost
CSR.
csrw tohost, rs1
(short for csrrw x0, csr, rs1
where csr=tohost=0x51E
)csrwi tohost, uimm
(short for csrrwi x0, csr, uimm
where csr=tohost=0x51E
)The instruction formats for these instructions are as follows:
Note that the immediate forms use a 5-bit zero-extended immediate (uimm) encoded in the rs1
field.
The Control Status Registers unit (located in csr.circ
) is already fully implemented for you! Please do not edit anything in the circuit, including input/output pins, or the autograder tests may fail.
If you want to learn more about CSR, you can refer to Chapter 9 of the RISC-V specification.
Here’s a quick summary of its inputs and outputs:
Signal Name | Direction | Bit Width | Description |
---|---|---|---|
CSR_address | Input | 12 | Input CSR register address |
CSR_din | Input | 32 | Value to write into specified CSR register |
CSR_WE | Input | 1 | Write enable |
clk | Input | 1 | Clock input |
tohost | Output | 32 | Output of the tohost register |
The main CPU circuit (located in cpu.circ
) implements the main datapath and connects all the subcircuits (ALU, Branch Comparator, Control Logic, Control Status Registers, Immediate Generator, Memory, and RegFile) together. After finishing this task, your CPU should be making use of all these components.
As a refresher, here’s a quick summary of its inputs and outputs:
Input Name | Bit Width | Description |
---|---|---|
READ_DATA | 32 | Driven with the data at the data memory address identified by the WRITE_ADDRESS (see below). |
INSTRUCTION | 32 | Driven with the instruction at the instruction memory address identified by the FETCH_ADDRESS (see below). |
CLOCK | 1 | The input for the clock. As with the register file, this can be sent into subcircuits (e.g. the CLK input for your register file) or attached directly to the clock inputs of memory units in Logisim, but should not otherwise be gated (i.e., do not invert it, do not AND it with anything, etc.). |
Output Name | Bit Width | Description |
---|---|---|
ra | 32 | Driven with the contents of ra (FOR TESTING) |
sp | 32 | Driven with the contents of sp (FOR TESTING) |
t0 | 32 | Driven with the contents of t0 (FOR TESTING) |
t1 | 32 | Driven with the contents of t1 (FOR TESTING) |
t2 | 32 | Driven with the contents of t2 (FOR TESTING) |
s0 | 32 | Driven with the contents of s0 (FOR TESTING) |
s1 | 32 | Driven with the contents of s1 (FOR TESTING) |
a0 | 32 | Driven with the contents of a0 (FOR TESTING) |
tohost | 32 | Driven with the contents of CSR 0x51E (FOR TESTING, for Part A leave it as-is) |
WRITE_ADDRESS | 32 | This output is used to select which address to read/write data from in data memory. |
WRITE_DATA | 32 | This output is used to provide write data to data memory. |
WRITE_ENABLE | 4 | This output is used to provide the write enable mask to data memory. |
PROGRAM_COUNTER | 32 | This output is used to select which instruction is presented to the processor on the INSTRUCTION input. |
We strongly recommend that you review the Single Cycle CPU: A Guide section from Part A.
Again, make sure that you do not edit the input/output pins or add new ones!
We’ve provided some basic sanity tests for your CPU in the tests/part-b/sanity/
directory. You can run these with:
$ python3 test.py tests/part-b/sanity/
Refer to the Info: CPU Testing section for more info on using these tests.
Note that these sanity tests are not comprehensive; they are intended to guide you in the early stages of testing until you start Task 5.
For Part B, we have provided a set of basic, visible unit tests in the autograder and starter code. These tests are meant to help reduce your stress by providing some guidance in the early stages of testing, but they are not comprehensive. You should still write rigorous tests for your designs, as passing the basic sanity tests does not guarantee that you will pass any of the hidden tests.
The autograder tests for Part B fall into 3 main categories: unit tests, integration tests, and edge case tests. We won’t be revealing all the autograder tests, but you should be able to re-create a very close approximation of them with the tools we provide in order to test your CPU.
Unit tests: a unit test exercises your datapath with a single instruction, to make sure that each individual instruction has been implemented and is working as expected. You should write a different unit test for every single instruction that you need to implement, and make sure that you test the spectrum of possibilities for that instruction thoroughly. For example, a unit test slt
should contain cases where rs1 < rs2
, rs1 > rs2
, and where rs1 == rs2
.
Integration tests: After you’ve passed your unit tests, move onto tests that use multiple functions in combination. Try out various simple RISC-V programs that run a single function; your CPU should be able to handle them, if working properly. Feel free to try to use riscv-gcc
to compile C programs to RISC-V, but be aware of the limited instruction set we’re working with (you don’t have any ecall
instructions, for example). We’d recommend that you instead try to write simple functions on your own based on what you’ve seen in labs, discussions, projects, and exams.
Edge case tests: edge case tests try inputs that you normally wouldn’t expect, which may trigger bugs in certain situations. What edge cases should you look for? A small example/hint from us: our 2 main classes of edge cases come from memory operations and branch/jump operations (two of the tests are mem-various-offsets
, which tests memory instructions with various offsets, and br-jump-limits
, which tests limits of branch/jump instructions). Think about other ways these operations could trigger potential bugs.
We’ve included a script (tools/create-test.py
) that uses Venus to help you generate test circuits from RISC-V assembly! The process for writing custom tests is as follows:
.s
in the tests/part-b/custom/inputs/
folder. The name of this file will be the name of your test. Repeat if you have more tests.tests/part-b/custom/inputs/sll-slli.s
and tests/part-b/custom/inputs/beq.s
create-test.py
: $ python3 tools/create-test.py tests/part-b/custom/inputs/sll-slli.s tests/part-b/custom/inputs/beq.s
Reminder: if you want to regenerate everything, you can take advantage of Bash globs:
$ python3 tools/create-test.py tests/part-b/custom/inputs/*.s
This should generate a couple new files to go with your assembly file:
tests/part-b/custom/:
- cpu-<TEST_NAME>.circ # The new circuit for your test
- inputs/<TEST_NAME>.s # The test file you wrote (unchanged)
- reference-outputs/cpu-<TEST_NAME>-pipelined-ref.out # The pipelined reference output for your test
- reference-outputs/cpu-<TEST_NAME>-ref.out # The single-cycle reference output for your test
$ python3 test.py tests/part-b/custom/sll-slli.circ tests/part-b/custom/beq.circ
Reminder: you can run all tests in a directory:
$ python3 test.py tests/part-b/custom/
By default, the number of cycles for a test will be just enough for all instructions in the test, as well as extra cycles for the register writeback and pipelining. If you wish to override this and simulate your code for a certain number of cycles, you can use the --cycles
flag:
$ python3 tools/create-test.py --cycles <NUMBER_OF_CYCLES> <ASM_PATH>
Refer to the Info: CPU Testing section for more info on using these tests.
Test coverage: a metric measuring how much of a given codebase is being tested by tests. For the purposes of this project, you will be graded on how much of the required ISA your tests cover.
The autograder for Part B will examine the coverage of tests located in the tests/part-b/custom/inputs/
folder. When you submit Part B to the autograder, the autograder will output a message about the percentage coverage of your tests against our staff suite of tests and notify you if any of your tests raised a syntax error.
.s
file testing each instruction.ra
, sp
, t0
, t1
, t2
, s0
, s1
, and a0
).addi
, lui
, and sw
. This means that your CPU must handle addi
and lui
properly, or they may cause other instructions’ tests to fail. Additionally, failures in lw
or sw
may affect each other, since we (currently) cannot write to memory without sw
or inspect a value in memory without lw
. Make sure to test these instructions extensively!csrw
, csrwi
). To compensate, the autograder has a visible robust CSR sanity check that will test all needed functionality of the CSR.So far, your CPU is capable of executing instructions in our ISA in a single cycle. Now, it’s time to implement pipelining in your CPU! For this project, you’ll need to implement a 2-stage pipeline, which is still conceptually similar to the 5-stage pipeline covered in lecture and discussion (review those if you need a refresher). The two stages you’ll implement are:
Because all of the control and execution is handled in the Execute stage, your processor should be more or less indistinguishable from a single-cycle implementation, barring the one-cycle startup latency. However, we will be enforcing the two-stage pipeline design. Some things to consider:
PC
values?PC
between the pipelining stages?You might also notice a bootstrapping problem here: during the first cycle, the instruction register sitting between the pipeline stages won’t contain an instruction loaded from memory. How do we deal with this? Luckily, Logisim automatically sets registers to zero on reset, so the instruction register will start with a nop
! If you wish, you can depend on this behavior of Logisim.
Since your CPU will support branch and jump instructions, you’ll need to handle control hazards that occur when branching.
The instruction immediately after a branch or jump is not executed if a branch is taken. This makes your task a bit more complex. By the time you have figured out that a branch or jump is in the execute stage, you have already accessed the instruction memory and pulled out (possibly) the wrong instruction. You will therefore need to “kill” instruction that is being fetched if the instruction under execution is a jump or a taken branch.
Instruction kills for this project MUST be accomplished by MUXing a nop
into the instruction stream and sending the nop
into the Execute stage instead of using the fetched instruction. You can use 0x00000013, or addi x0, x0, 0
, for this purpose; other nop
instructions will work too. You should kill if a branch is taken (do not kill otherwise). Do kill on every type of jump.
Note: do not solve this issue by calculating branch offsets in the IF stage. If we test your output against the reference every cycle, and the reference returns a nop
, while it may be a conceptually correct solution, this will cause you to fail our tests.
Some more things to consider:
nop
into the instruction stream, do you place it before or after the instruction register?nop
? Is this different than normal?We’ve provided some basic sanity tests for your pipelined CPU in the tests/part-b/sanity/
directory (same tests as in Task 4). You can run these with:
$ python3 test.py --pipelined tests/part-b/sanity/
Note: since your CPU is pipelined at this point, you need to run the pipelined tests using the --pipelined
(or -p
) flag. If you run the single-cycle tests (i.e. not using the --pipelined
flag) at this point (after pipelining your CPU), your CPU should now fail those tests! Think about why this happens…
Similarly, you can also run the pipelined version of your custom tests:
$ python3 test.py --pipelined tests/part-b/custom/
Note: because you’re implementing a 2-stage pipelined processor and the first instruction writes on the rising edge of the second clock cycle, the effects of your instructions will have a 2 instruction delay. For example, let’s look at the first instruction of tests/part-b/sanity/inputs/addi.s
, addi t0, x0, -1
. If you inspect the pipelined reference output (tests/part-b/sanity/reference-output/cpu-addi-pipelined-ref.out
), you’ll see that t0
doesn’t show changes until the third cycle.
Refer to the Info: CPU Testing section for more info on using these tests. Keep in mind that you’re working with a pipelined circuit from this task onward.
Time to update your README.md
! Once again, write down how you implemented your circuits and components for this part (including the various subcircuits you used), and explain the reasoning behind your design choices. In particular, we want to see:
Your additions to the README should be at least 512 characters (although something more than the bare minimum would be nice), but other than that feel free to get creative!
At this point, if you’ve completed tasks 4-7, you’ve finished Part B of the project. Congratulations on your shiny new CPU!
Double-check and make sure that that:
.circ
files; the autograder will only be testing the circuit files you were allowed to edit for Part B (branch-comp.circ
, control-logic.circ
, cpu.circ
, imm-gen.circ
)..s
tests are located in tests/part-b/custom/inputs/
, since the autograder will test those for coverage.To prevent double jeopardy, the autograder will replace your ALU and RegFile with the staff ALU and RegFile. This means that if your ALU or RegFile from Part A have issues, they will not affect your autograder results in Part B. However, this also means that you must not depend on out-of-spec behavior from these circuits.
The autograder for Part B uses the sanity tests provided in the starter code, as well as hidden unit, integration, and edge case tests as specified in Task 5. Additionally, the autograder will check your custom tests for test coverage.
Note: If you fail on any of the provided autograder sanity tests, course staff will not help you debug your CPU unless you have recreated the failure in a custom test.
The grading breakdown for Project 3 is as follows:
addi
(5%)