Consider two competing 8-bit floating point formats. Each contains the same fields (sign, exponent, significand) and follows the same general rules as the 32-bit IEEE standard (denorms, biased exponent, non-numeric values, etc.), but allocates its bits differently. To save you time, you only need to complete and circle the (LEFT or RIGHT) blank whose value is closest to zero, that's the only one we'll grade! (If they're the same value, write the answer in both, & circle both). E.g,. The number represented by 0x00 was 0 for both, so we circled both. But for "exponent bias", just from the # of EE...E bits in each, we know |LEFT's bias| < |RIGHT's bias|, so there's no need to calculate or write the answer on the RIGHT. "LEFT" format: sign . 2exp-bies +1 - a mont sign . 20. 1. mont s EE MMMMM "RIGHT" format: -31 < exp <- ( S EEEEEE scratch space (show all work here) scratch space (show all work here) -1 Lexp < 1 $.2^{5} = 32$ mant choices mant choices Exponent Bias: 31 (not graded, so no need to write) $$2^{-30} \cdot 2^{-1} = 51e^{-31}e^{-30}$$ $2^{-30} \cdot 2^{-1} = 2^{-30}$ Number represented by 0x00: Number represented by 0x00: Exponent Bias: a) # Numbers (0 ≤ n < 1):</li> b) # Numbers (1 ≤ n < 2):</p> c) Difference between two smallest positive values: d) Difference between two biggest non-∞ values: it cannot represent: - e) Positive Integer closest to 0 - Difference between two biggest non-∞ values: # Numbers $(0 \le n < 1)$ : # Numbers (1 ≤ n < 2): Difference between two smallest positive values: Positive Integer closest to 0 it cannot represent: f) Which implementation is better for approximating a LEFT or RIGHT ? (circle one) = 3.14 Stepsize= $$2^{1} = 2^{exp} \cdot 2^{exp}$$ Step= $2^{1} = 2^{exp} \cdot 2^{-1}$ $= exp = 2$ $= exp:1$ , mant:0b11111 $y = exp2$ mant:0 $= exp:2$ mant:1 $= exp:2$ mant:0 $= exp:2$ mant:0 $= exp:2$ mant:0 $= exp:2$ mant:0 $= exp:2$ mant:0 $= exp:2$ mant:0 - 1) For a 12-bit integer represented with two's complement, what is the: - a) Most positive value (in decimal): 2"-1 = 2047 - b) Binary representation of that number: - 100 .... - 06011 111 111) - $50 \dots -2'' = -20$ - d) Hex representation of that number: c) Most negative value (in decimal): 0×400 - e) In general, for an n-bit, two's complement integer: - i) What is the most positive you can represent, in decimal? - $\frac{2^{n}-1}{-2^{n-1}}$ - ii) What is the most negative you can represent, in decimal? - 2) Fill in the blank below so that the function mod16 will return the remainder of x when divided by 16. The first blank should be a <u>bitwise</u> operator, and the second blank should be a single decimal number: Set to 0: & w 0 Joes: Row 1 rothing Set to 1: | w/ 1 Joes: Row 1 rothing Tio Mebit: 1 w/ 1 26 2120 mask: 0000 (1)) ۲۲, >>, ~ Connect the definition with the name of the process that describes it. - a) Compiler - b) Assembler - c) Linker - d) Loader - 1) Outputs code that may still contain pseudoinstructions. - 2) Takes binaries stored on disk and places them in memory to run. - 3) Makes two passes over the code to solve the "forward reference" problem. - 4) Creates a symbol table. - 5) Combines multiple text and data segments. - 6) Generates assembly language code. - 7) Generates machine language code. - 8) Only allows generation of TAL. - 9) Only allows generation of binary machine code. 30-0-0-0-0-0 (b) 2-input NOR gates are said to be complete because any Boolean function can be computed with them. Prove this fact. Hint: implement a subset of the standard gates (AND, NOT, OR, NOR, NAND, XOR, XNOR) using just NOR gates, then apply a standard boolean algebra technique using these gates. ## see next page for detailed steps (6) (1) (c) We want to implement a very simple finite state machine that determines its next state by the result of and AND operation on the current state and the input. The output is always the current state. Assume registers have a CLK to Q delay of 5ns, a setup time of 2ns, and a hold time of 3ns. To achieve a clock rate of 25MHz, what is the maximum propagation delay that a NOR gate could have, assuming we are implementing AND as a combination of one or more of the gates built in part (b)? (d) Complete the state diagram for a finite state machine that outputs 1 if and only if it has just seen the input sequence 101 and it has never seen the input sequence 001. You may add more arrows or more states as you see fit. Provide a brief description of each state. Example Input: 1101010100101 Output: 0001010100000 NAND: AB + AB + AB | (1) | NOT | A-Do- | |-----|-----|-------| |-----|-----|-------| | AND THATA Table | | | | | | | | |-----------------|-----|-----|---|--|--|--|--| | k | 1 B | out | | | | | | | 0 | 0 | 0 | | | | | | | - | 1 | 0 | • | | | | | | <u> </u> | 0 | 0 | | | | | | | <u> </u> | 1 | 1 | | | | | | | trum table | | | | | | | | |---------------|---|-----|--|--|--|--|--| | A | В | out | | | | | | | 0 | ٦ | 21 | | | | | | | 0 | 1 | 0 | | | | | | | | 0 | 0 | | | | | | | $\mathcal{T}$ | 1 | 100 | | | | | | NOR | D OR | A DO-CO | |------|---------| | | not A | $$\begin{array}{c} \widehat{AB} \rightarrow \widehat{A} + \widehat{B} \\ \widehat{AB} + \widehat{AB} & \widehat{BA} + \widehat{BA} \\ \widehat{AB} + \widehat{AB} + \widehat{BA} \end{array}$$ $$\begin{array}{c} \widehat{AB} + \widehat{AB} + \widehat{BA} + \widehat{BA} \\ \widehat{AB} + \widehat{AB} + \widehat{BA} \end{array}$$ ## (C) $$t_{ontical} = t_{ukto}u + NOR + NOR + t_{semp}$$ $$t_{ons} = t_{ukto}u + NOR + NOR + t_{semp}$$ $$t_{ons} = t_{ukto}u + NOR + NOR + t_{semp}$$ $$t_{ons} = t_{ukto}u + NOR + NOR + t_{semp}$$ $$\frac{25MHz}{25\kappa(0^6s^{-1})} = \frac{4\times10^{-8}s}{40ns}$$ $$\frac{25 \times (0^{6} \text{ s}^{-1})}{25 \times (0^{5} \text{ s}^{-1})} \xrightarrow{25 \times (0^{5} \text{ s}^{-1})} \frac{1 \times 10^{-6}}{25 \times (0^{5} \text{ s}^{-1})}$$ Prof. Wawrzynek decides to design a new ISA for his ternary neural network accelerator. He only needs to perform 7 different operations with his ISA: XOR, ADD, LD, SW, LUI, ADDI, and BLT. He decides that each instruction should be 17 bits wide, as he likes the number 17. There are no funct7 or funct3 fields in this new ISA. (b) What is the minimum number of bits required for the opcode field? (c) Suppose Prof. Wawrzynek decides to make the opcode field 6 bits. If we would like to support instructions with 3 register fields, what is the maximum number of registers we could address? $(\mathcal{N})$ lw I Load Word R[rd] = {32'bM[](31),M[R[rs1]+imm](31:0)} **MNEMONIC** FMT FUNCT3 FUNCT7 OR IMM HEXADECIMAL 03/2 lw I 010 ## CORE INSTRUCTION FORMATS | | 31 | 27 | 26 | 25 | 24 | 20 | 19 | 15 | 14 | 12 | 11 | 7 | 6 | 0 | |---------------|-----------------------|----|----|-----|----|------------|----|--------|-----|------------------|------|------|--------|---| | R | funct7 | | | rs2 | | rs1 | | funct3 | | rd | | Opco | ode | | | I | imm[11:0] | | | | | | rs | s1 | fun | ct3 | rd | | Opcode | | | $\mathbf{S}$ | imm[11:5] | | | rs | 2 | rs1 | | funct3 | | imm[4:0] | | opco | de | | | SB | imm[12 10:5] rs2 | | | | 2 | rs1 funct3 | | | ct3 | imm[4:1 11] opco | | de | | | | $\mathbf{U}$ | imm[31:12] | | | | | | | | rc | 1 | opco | de | | | | $\mathbf{UJ}$ | imm[20 10:1 11 19:12] | | | | | | | | rc | 1 | opco | de | | | **OPCODE** 0000011 lw + 5 17 (+6) $16 - rs1 = x^3$ imm = 17rd = +5 = x30 ``` Assume we have two arrays input and result. They are initialized as follows: ``` ``` int *input = malloc(8*sizeof(int)); int *result = calloc(8, sizeof(int)); for (int i = 0; i < 8; i++) { input[i] = i; You are given the following RISC-V code. Assume register a0 holds the address of input and register at holds the address of result when MAGIC is called by main. ao = & ( Input) az= t(result) # Start Calling MAGIC a1 = 8 addi a1, x0, 8 # a0 holds input, a2 holds result jal ra, MAGIC # Checkpoint: finished calling MAGIC exit: addi a0, x0, 10 add a1, x0, x0 ecall # Terminate ecall # TODO: prologue. What registers need to be stored onto the stack? mv s0, x0 So = 0 mv t0, x0 to =0 loop: if (to = = a1) go to done 1 !=& beq (0) a1, done tt= input co] = 0 lw t1, 0(a0) t! = hput ci J = 1 So = So++1 = input Co] 20 so = thousecout imputers add s0, s0, t1 t2= t0 << 2 = 0 slli t2, t0, 2 add t2, t2, a2 t2= 02+ 42 = 02 t2=a2+t2 +200] = 50 = [hput 0] =0 sw s0, 0(t2) tztej yesuh [i] = 50 addi t0, t0, 1 t0 = to 41 to atoti addi a0, a0, 4 Q0 = Q0+4 a0=a0+4 jal x0, loop done: pressor elevents in the aneu mv a0, s0 # TODO: epilogue. What registers need to be restored? jr ra input 0+1+2 5 to index ``` (a) Consider the function MAGIC The prologue and epilogue for this function are missing. Which registers should be saved/restored in MAGIC's prologue/epilogue? Select all that apply. 0 to 0 al 0 to 0 al 0 to 0 al a (b) Assume you have the prologue and epilogue correctly coded. You set a breakpoint at "Checkpoint: finish calling MAGIC" and call main. What does result contain when your program pauses at the breakpoint? Please write the 8 numbers starting at result in the blanks below. 0 1 3 6 (0 15 21 28 (c) Translate MAGIC into C code. You may or may not need all of the lines provided below. int sum = 0; rength for (int i=0; i < b; j++75 sum = sum + aciJ; ccij = sum; 3 } We wish to introduce a new instruction into our single-cycle datapath. The instruction SIZ (set if zero) works as follows: Given the single cycle datapath below, select the correct modifications in parts (a) - (d) such that the datapath executes correctly for this new instruction (and all core instructions!). You can make the following assumptions: - the SIZ signal is 1 if and only if the instruction is SIZ - ALUSe1 is ADD when when we a have SIZ instruction. - the immediate generator outputs ZERO when we have a SIZ instruction. (a) Consider the following modifications to the <u>branch comparator inputs</u>. Which configuration will allow this instruction to execute correctly without breaking the execution of other instructions in our instruction set? (b) Consider the following modifications to the ALU inputs. Which configuration will allow this instruction to execute correctly without breaking the execution of other instructions in our instruction set? Select the configuration that requires minimum modifications to the original datapath. Notice in the bottom left choice BSe1 is unused. (c) Consider the following modifications to the WB mux inputs. Which configuration will allow this instruction to execute correctly without breaking the execution of other instructions in our instruction set? Select the configuration that requires minimum modifications to the original datapath. (d) Consider the following modifications to the RegWEn inputs. Which configuration will allow this instruction to execute correctly without breaking the execution of other instructions in our instruction set? | | ram given at the beatter. You can assur | | m. Select X when a signal's | |----------------------------|-----------------------------------------|----------------------|-----------------------------| | • the SIZ sign | al is 1 if and only if | the instruction is S | IZ | | • ALUSel is A | DD when when we | a have SIZ instruct | ion. | | • the immedia | te generator output | s ZERO when we h | ave a SIZ instruction. | | 1. PCSel: | | | | | O 1 0 | Ох | | | | 2. RegWEn: | | | | | O 1 (Enable) | <ul><li>0 (Disable)</li></ul> | ОХ | | | 3. BrUn: | | | PCmj | | O 1 (Signed) | O 0 (Unsigned) | • X | == | | 4. BSel: | | | 0 | | 01 00 | • x | | | | 5. ASel: | | | | | <b>1</b> 00 | Ох | | | | 6. MemRW: | | | | | O 1 (Enable) | • 0 (Disable) | Ох | | | 7. WBSel | | | | | <ul> <li>ALUOut</li> </ul> | PUSI | O MemOut | | O Other: Please specify:\_\_ O PC + 4 (e) Given your selections above, decide the rest of the control signals for this instruction