

EECS151/251A
Spring 2018
Digital Design and
Integrated Circuits

Instructors:

John Wawrzynek and Nick Weaver

Lecture 21: Multiplier Circuits

#### **Multiplication**

$$a_1b_0+a_0b_1a_0b_0 \leftarrow Product$$

Many different circuits exist for multiplication. Each one has a different balance between speed (performance) and amount of logic (cost).

### "Shift and Add" Multiplier



n-bit register

Cost  $\alpha$  n, T = n clock cycles.

What is the critical path for

determining the min clock

n-bit adder

period?

- Sums each partial product, one at a time.
- In binary, each partial product is shifted versions of A or 0.

#### Control Algorithm:

- P ← 0, A ← multiplicand,
   B ← multiplier
- 2. If LSB of B==1 then add A to P else add 0
- 3. Shift [P][B] right 1
- 4. Repeat steps 2 and 3 n-1 times.
- 5. [P][B] has product.

### "Shift and Add" Multiplier

#### Signed Multiplication:

Remember for 2's complement numbers MSB has negative weight:

$$X = \sum_{i=0}^{N-2} x_i 2^i - x_{n-1} 2^{n-1}$$

ex: 
$$-6 = 11010_2 = 0 \cdot 2^0 + 1 \cdot 2^1 + 0 \cdot 2^2 + 1 \cdot 2^3 - 1 \cdot 2^4$$
  
= 0 + 2 + 0 + 8 - 16 = -6

- Therefore for multiplication:
  - a) subtract final partial product
  - b) sign-extend partial products
- Modifications to shift & add circuit:
  - a) adder/subtractor
  - b) sign-extender on P shifter register

### **Outline**



- Combinational multiplier
- □ Latency & Throughput
  - Wallace Tree
  - Pipelining to increase throughput
- Smaller multipliers
  - Booth encoding
  - Serial, bit-serial
- Two's complement multiplier



Unsigned Combinational Multiplier

#### Array Multiplier

Single cycle multiply: Generates all n partial products simultaneously.



# Combinational Multiplier (unsigned)



### **Carry-Save Addition**

- Speeding up multiplication is a matter of speeding up the summing of the partial products.
- "Carry-save" addition can help.
- Carry-save addition passes
   (saves) the carries to the output,
   rather than propagating them.

• Example: sum three numbers,  $3_{10} = 0011$ ,  $2_{10} = 0010$ ,  $3_{10} = 0011$ 

tion can help. 
$$3_{10} \ 0011$$
 on passes s to the output, gating them.  $+2_{10} \ 0010 = 4_{10}$  carry-save add  $3_{10} \ 0011 = 1_{10}$  carry-propagate add  $-10010 = 2_{10}$  s  $010 = 2_{10}$  s  $010 = 8_{10}$ 

- In general, *carry-save* addition takes in 3 numbers and produces 2.
- Whereas, carry-propagate takes 2 and produces 1.
- With this technique, we can avoid carry propagation until final addition

#### **Carry-save Circuits**



- When adding sets of numbers, carry-save can be used on all but the final sum.
- Standard adder (carry propagate) is used for final sum.
- Carry-save is fast (no carry propagation) and cheap (same cost as ripple adder)



### Array Multiplier using Carry-save Addition



#### **Carry-save Addition**

CSA is associative and communitive. For example:

$$(((X_0 + X_1) + X_2) + X_3) = ((X_0 + X_1) + (X_2 + X_3))$$



- A balanced tree can be used to reduce the logic delay.
- This structure is the basis of the Wallace Tree Multiplier.
- Partial products are summed with the CSA tree. Fast CPA (ex: CLA) is used for final sum.
- Multiplier delay α log<sub>3/2</sub>N + log<sub>2</sub>N

### Increasing Throughput: Pipelining



Throughput =  $1/4t_{PD,FA}$  instead of  $1/8t_{PD,FA}$  13



# **Smaller Combinational Multipliers**

### Booth Recoding: Higher-radix mult.

Idea: If we could use, say, 2 bits of the multiplier in generating each partial product we would halve the number of columns and halve the latency of the multiplier!



Booth's insight: rewrite 2\*A and 3\*A cases, leave 4A for next partial product to do!

$$B_{K+1,K}^*A = 0^*A \rightarrow 0$$

$$= 1^*A \rightarrow A$$

$$= 2^*A \rightarrow 4A - 2A$$

$$= 3^*A \rightarrow 4A - A$$

### **Booth recoding**

(On-the-fly canonical signed digit encoding!)

current bit pair from previous bit pair

| R                | D  | D                | :       |   |                             |
|------------------|----|------------------|---------|---|-----------------------------|
| B <sub>K+1</sub> | PK | B <sub>K-1</sub> | action  |   | $B_{K+1,K}^*A = 0^*A \to 0$ |
| 0                | 0  | 0                | add 0   |   | $= 1*A \rightarrow A$       |
| 0                | 0  | 1                | add A   |   | = 2*A → 4                   |
| 0                | 1  | 0                | add A   |   | = 3*A → 4                   |
| 0                | 1  | 1                | add 2*A |   |                             |
| 1                | 0  | 0                | sub 2*A |   |                             |
| 1                | 0  | 1                | sub A   | ← | -2*A+A                      |
| 1                | 1  | 0                | sub A   |   |                             |
| 1                | 1  | 1                | add 0   | ← | -A+A                        |
|                  |    | <b>^</b>         |         |   |                             |

A "1" in this bit means the previous stage needed to add 4\*A. Since this stage is shifted by 2 bits with respect to the previous stage, adding 4\*A in the previous stage is like adding A in this stage!

#### Bit-serial Multiplier

• Bit-serial multiplier (n<sup>2</sup> cycles, one bit of result per n cycles):



Control Algorithm:



**Signed Multipliers** 

### Combinational Multiplier (signed!)



### Combinational Multiplier (signed)



## 2's Complement Multiplication

(Baugh-Wooley)

Step 1: two's complement operands so high order bit is  $-2^{N-1}$ . Must sign extend partial products and subtract the last one

Step 2: don't want all those extra additions, so add a carefully chosen constant, remembering to subtract it at the end. Convert subtraction into add of (complement + 1).

Step 3: add the ones to the partial products and propagate the carries. All the sign extension bits go away!

Step 4: finish computing the constants...

```
+ X3Y3 X2Y3 X1Y3 X0Y3

+ X2Y2 X1Y2 X0Y2

+ X3Y3 X2Y3 X1Y3 X0Y3

+ 1 1 1
```

Result: multiplying 2's complement operands takes just about same amount of hardware as multiplying unsigned operands!

### 2's Complement Multiplication



### Multiplication in Verilog

You can use the "\*" operator to multiply two numbers:

```
wire [9:0] a,b;
wire [19:0] result = a*b; // unsigned multiplication!
```

If you want Verilog to treat your operands as signed two's complement numbers, add the keyword signed to your wire or reg declaration:

```
wire signed [9:0] a,b;
wire signed [19:0] result = a*b; // signed multiplication!
```

Remember: unlike addition and subtraction, you need different circuitry if your multiplication operands are signed vs. unsigned. Same is true of the >>> (arithmetic right shift) operator. To get signed operations all operands must be signed.

```
wire signed [9:0] a;
wire [9:0] b;
wire signed [19:0] result = a*$signed(b);
```

To make a signed constant: 10'sh37C