Checkpoint1: Pipelined MIPS150

Checkpoint1: Pipelined MIPS150

Introduction

The first step of the project will be to implement a 3 stage pipelined MIPS. This MIPS will be used to coordinate the functionality of the entire project. A high level overview of the final system can be seen below.

Each of the four checkpoints is indicated as follows.

The first checkpoint is colored purple, this includes the MIPS150 processor communicating through the UART interface developed in lab 5.
The second checkpoint is colored yellow, and will include implementing an Ethernet transmitter and receiver.
The third checkpoint is colored red, and will consist of implementing a data and instruction cache, as well as interfacing with off chip DRAM.
The final checkpoint is colored green, and will consist of implementing an accelerated 2D graphics engine.

The ISA

The MIPS150 processor implements a useful subset of the entire MIPS instruction set architecture. Examples of what is not included are divide and multiple instructions, trap and exception instructions, coprocessor instructions, and floating point instructions. The entire MIPS150 encoding for the instruction set architecture is documented in the table below.

The functionality of each instruction is shown below. Where,

R[$x] indicates the register with address x
SEXT indicates sign extension
ZEXT indicates zero extension
BMEM indicates a byte aligned access to memory
HMEM indicates a half word aligned access to memory
WMEM indicates a word aligned access to memory
PC indicates the PC of the instruction

Mnemonic	RTL Description	Notes
`LB`	`R[$rt] = SEXT(BMEM[(R[$rs]+SEXT(imm))[31:0]])`	delayed
`LH`	`R[$rt] = SEXT(HMEM[(R[$rs]+SEXT(imm))[31:1]])`	delayed
`LW`	`R[$rt] = WMEM[(R[$rs]+SEXT(imm))[31:2]]`	delayed
`LBU`	`R[$rt] = ZEXT(BMEM[(R[$rs]+SEXT(imm))[31:0]])`	delayed
`LHU`	`R[$rt] = ZEXT(HMEM[(R[$rs]+SEXT(imm))[31:1]])`	delayed
`SB`	`BMEM[(R[$rs]+SEXT(imm))[31:0]] = R[$rt][7:0]`
`SH`	`HMEM[(R[$rs]+SEXT(imm))[31:1]] = R[$rt][15:0]`
`SW`	`WMEM[(R[$rs]+SEXT(imm))[31:2]] = R[$rt]`
`ADDIU`	`R[$rt] = R[$rs] + SEXT(imm)`
`SLTI`	`R[$rt] = R[$rs] < SEXT(imm)`
`SLTIU`	`R[$rt] = R[$rs] < SEXT(imm)`	unsigned compare
`ANDI`	`R[$rt] = R[$rs] & ZEXT(imm)`
`ORI`	`R[$rt] = R[$rs] \| ZEXT(imm)`
`XORI`	`R[$rt] = R[$rs] ^ ZEXT(imm)`
`LUI`	`R[$rt] = {imm, 16'b0}`
`SLL`	`R[$rd] = R[$rt] << shamt`
`SRL`	`R[$rd] = R[$rt] >> shamt`
`SRA`	`R[$rd] = R[$rt] >>> shamt`
`SLLV`	`R[$rd] = R[$rt] << R[$rs]`
`SRLV`	`R[$rd] = R[$rt] >> R[$rs]`
`SRAV`	`R[$rd] = R[$rt] >>> R[$rs]`
`ADDU`	`R[$rd] = R[$rs] + R[$rt]`
`SUBU`	`R[$rd] = R[$rs] - R[$rt]`
`AND`	`R[$rd] = R[$rs] & R[$rt]`
`OR`	`R[$rd] = R[$rs] \| R[$rt]`
`XOR`	`R[$rd] = R[$rs] ^ R[$rt]`
`NOR`	`R[$rd] = ~R[$rs] & ~R[$rt]`
`SLT`	`R[$rd] = R[$rs] < R[$rt]`
`SLTU`	`R[$rd] = R[$rs] < R[$rt]`	unsigned compare
`J`	`PC = {PC[31:28], target, 2'b0}`	delayed
`JAL`	`R[31] = PC + 8; PC = {PC[31:28], target, 2'b0}`	delayed
`JR`	`PC = R[$rs]`	delayed
`JALR`	`R[$rd] = PC + 8; PC = R[$rs]`	delayed
`BEQ`	`PC = PC + 4 + (R[$rs] == R[$rt] ? SEXT(imm) << 2 : 0)`	delayed
`BNE`	`PC = PC + 4 + (R[$rs] != R[$rt] ? SEXT(imm) << 2 : 0)`	delayed
`BLEZ`	`PC = PC + 4 + (R[$rs] <= 0 ? SEXT(imm) << 2 : 0)`	delayed
`BGTZ`	`PC = PC + 4 + (R[$rs] > 0 ? SEXT(imm) << 2 : 0)`	delayed
`BLTZ`	`PC = PC + 4 + (R[$rs] < 0 ? SEXT(imm) << 2 : 0)`	delayed
`BGEZ`	`PC = PC + 4 + (R[$rs] >= 0 ? SEXT(imm) << 2 : 0)`	delayed

Pipelining

As stated earlier, the MIPS150 you design must have 3 pipeline stages. Although the stages have been arbitrarily labeled I, X, and M (meaning Instruction, Execute, and Memory), what each stage does is entirely up to you. A straightforward way to think of this pipeline is to consider the classic 5 stage MIPS studied in class, then consider removing two stages.

Delay Slots

The instructions indicated as delayed in the Notes section of the table above, have architectural delay slots. This means that the instruction directly following the instruction marked as delayed, is always executed, and furthermore has no dependencies on the execution of the current instruction. Fortunately, our compiler handles all of this behind the scenes. The C compiler will generate code with the correct delay slot fills, likewise the assembler will rearrange instructions when it can to fill delay slots. This means that, when writing assembly, do not explicitly fill delay slots, allow the assembler to do it.

Memory Interface

Your processor will use memory mapped IO to communicate with the different components of your system. Eventually these components will include a data cache, a serial interface, an Ethernet interface, and a graphics interface. For this checkpoint you will be responsible for memory mapping a data memory(not cache) and a serial interface. For the data and instruction memories you will be using block RAM generated with Coregen. These will be used to store the entirety of the data and instruction memory, not just the working set as would be the case with caches. In checkpoint 3 these block RAMs will be replaced with caches, that will use those block RAMs for the underlying cache storage.

Addresses	Read/Write	Function
`0xFFFF0000 - 0xFFFF0000`	Read	Receiver control
`0xFFFF0004 - 0xFFFF0004`	Read	Receiver data
`0xFFFF0008 - 0xFFFF0008`	Read	Transmitter control
`0xFFFF000C - 0xFFFF000C`	Write	Transmitter data

What this means is that when the processor does an lw from the address 0xFFFF0000, it should not read in a word from memory, but instead a word whose lowest bit indicates whether or not the UART has received a byte. One way of encoding this would be, {31'bx, UARTDataOutValid}. Likewise, when the CPU reads from the address 0xFFFF0008, it should read a word whose lowest bit indicates the readyness of the UART for transmission, something like the following Verilog {31'bx, UARTDataInReady} might encode this. The other two address are for actually transferring the data between UART and CPU.

Before you proceed with acquiring the skeleton files, you must set up a ssh key to use with github. If you have already done this you may skip the following subsection.

Setting Up SSH Keys

Github authenticates you for access to your repository using ssh keys. You can generate a ssh key using the ssh-keygen command. Accepting the defaults should be fine.

Once you have a key generated, go to your github account settings. Then go to the SSH Public Keys menu. Here, click Add another public key. In this dialog insert the contents of your public key file in to the Key box. By default the public key file will be in ~/.ssh/id_rsa.pub. Make sure you don't accidentally copy the contents of the ~/.ssh/id_rsa file. This is your private key, which is equivalent to your password. After you paste your key in, give it a name and your are done.

Acquiring Skeleton Files, Setting Up Repository

The skeleton files for the project will be available through a git repository provided by the staff. The suggested way for initializing your repository with the skeleton files is as follows. First, set up your ssh keys as described above. Then,

% git clone git@github.com:CS150/skeleton.git
% cd skeleton
% git remote add my-repo git@github.com:CS150/teamX.git
% git push my-repo master

This will make a single commit to your repository with the base files, we suggest you then do the following,

% cd ..
% rm -rf skeleton
% git clone git@github.com:CS150/teamX.git
% cd teamX
% git remote add staff git@github.com:CS150/skeleton.git

This will delete the skeleton repository you cloned, clone your repository that now has a single commit, and add a remote repository named staff that points to the skeleton files to allow easy future merges of staff updates.

Toolchain

A GCC MIPS toolchain has been built and installed in the cs150 home directory, these binaries will run on any of the p380 machines in the 125 Cory lab. The most relevant pieces of the toolchain are given below,

mips-gcc GCC for MIPS, compiles C files to MIPS binaries
mips-as MIPS assembler, compiles assembly to MIPS binaries
mips-objdump allows easy viewing of MIPS binaries

The easiest way to use this toolchain will be to copy the example project in the software directory of the skeleton files. That might look something like the following,

% cd software
% cp -r example helloworld
% cd helloworld
% mv example.c helloworld.c
% mv example.ld helloworld.ld
% gedit Makefile

The only editing required in the Makefile is to change the TARGET variable to be helloworld. You can then edit helloworld.c, and run make to compile. Adding C or assembly files to the directory, with the proper extension will cause them to automatically be compiled. There are a few things to be aware of in the example project,

start.s: This is an assembly file that contains the starting point for your program. By default GCC looks for a label named _start and makes it the entry point. The default start.s jumps to the main label, and initializes the stack pointer to the address 0xFF.
example.ld: This is a linker script, it guarantees that the start label is at address 0.
example.elf: Original binary produced by toolchain, mips-objdump -d example.elf will tell you which instructions make up the binary.
example.mif: File generated by toolchain, used to initialize block RAM memory for simulation.
example.coe: File generated by toolchain, used to initialize block RAM memory for synthesis.

Block RAM Generation

At some point you will need memory to complete checkpoint 1, while the memory inference discusses in lecture does work, it is not perfect. One thing it does not allow is the generation of byte masked memories, to achieve this we use the Xilinx Core Generator (coregen). Most of this has already been done, and included in the skeleton files, to view the file that specifies the block RAM parameters do the following,

% cd hardware/src/blk_mem_gen_v4_3
% gedit blk_mem_gen_v4_3.xco

Feel free to tweak whatever parameters you like, but the ones of interest are,

coe_file this specifies which coe file to use to initialize the block RAM for synthesis.
write_depth_a this is the depth of the memory, the default is 4096.

To actually build the block RAM, execute the following

% ./build

likewise, to cleanup generated files

% ./clean