NTU 6342 / EE 241 Homework #3 SOLUTIONS

Problem #1:

Delay times:

 $t_{p\_FO4\_20} \coloneqq 0.424 ns$ 

 $t_{p\_FO4\_100} \coloneqq 2.12 ns$ 

Simulation values:

$$T_{min} := 2 \cdot t_{p}FO4_{20} \qquad T_{max} := 2 \cdot t_{p}FO4_{100}$$

$$T_{min} = 0.848 \text{ ns} \qquad T_{max} = 4.24 \text{ ns}$$

$$f_{max} := \frac{1}{T_{min}} \qquad f_{min} := \frac{1}{T_{max}} \qquad f_{max} = 1.179 \text{ GHz} \qquad f_{min} = 235.849 \text{ MHz}$$
Part a:

PartAData := READPRN("inverter\_delay\_fo4\_20.txt")







Part b:



Interpolating the delay-Vdd curve to get the needed Vdd for a given delay:

$$V_{dd\_opt} := \begin{cases} \text{for } n \in 0..9 \\ x_n \leftarrow \text{linterp}\left(\frac{-t_p}{s}, \frac{\text{SupplyVoltage}}{V}, \frac{-\text{PulseWidth}_n}{s}\right) \\ x \cdot V \end{cases}$$





VddOptData := READPRN("inverter delay fo4 20 vdd opt.txt")

We can see from the graph that reducing the supply voltage reduces the speed to the amount required.

This results in a quadratic reduction in power as opposed to a linear decrease with only frequency scaling.

#### Part c:

BodyBiasData := READPRN("inverter delay fo4 20 vbb.txt")



We can see that the inverter propagation delay decreases as the threshold voltage is increased (reverse body bias) and increases when the threshold voltage is decreased (forward body bias).

From this graph, we can determine the required body bias needed to achive the desired propagation delay.







# Part d:

Forward biasing the transistor bulk:

$$\begin{aligned} \text{Delay}_{db\_neg} &\coloneqq \left[ \begin{array}{c} \text{for } n \in 0..4 \\ x_n \leftarrow \left( \text{VddVbbNegData}_n \right)^{\langle 5 \rangle} \text{s} \\ x \end{aligned} \right] & \text{V}_{bb\_db\_neg} &\coloneqq \left( \begin{array}{c} 0 \\ -0.15 \\ -0.3 \\ -0.45 \\ -0.6 \\ -0.6 \\ -0.75 \\ -0.9 \end{array} \right) \\ \text{V}_{dd\_db\_neg} &\coloneqq \left( \text{VddVbbNegData}_0 \right)^{\langle 0 \rangle} \text{V} \end{aligned}$$



Note that by forward biasing the bulk of the transistors, the threshold voltage is reduced, increasing drive current and therefore speed.

We can use this additional slack to further reduce the supply voltage, and consequently the power. Finding the corresponding Vdd and Vbb pairs for a given operating frequency:



Using this data, we can then simulate the inverter chain and find the optimal Vdd-Vbb pair for each frequency point that will result in the lowest power.

VddVbbNegOptData<sub>0</sub> := READPRN("inverter\_delay\_fo4\_20\_vdd\_vbb\_neg\_opt1.txt" ) VddVbbNegOptData<sub>1</sub> := READPRN("inverter\_delay\_fo4\_20\_vdd\_vbb\_neg\_opt2.txt" ) VddVbbNegOptData<sub>2</sub> := READPRN("inverter\_delay\_fo4\_20\_vdd\_vbb\_neg\_opt3.txt" ) VddVbbNegOptData<sub>3</sub> := READPRN("inverter\_delay\_fo4\_20\_vdd\_vbb\_neg\_opt4.txt" ) VddVbbNegOptData<sub>4</sub> := READPRN("inverter\_delay\_fo4\_20\_vdd\_vbb\_neg\_opt5.txt" ) VddVbbNegOptData<sub>5</sub> := READPRN("inverter\_delay\_fo4\_20\_vdd\_vbb\_neg\_opt6.txt" ) VddVbbNegOptData<sub>6</sub> := READPRN("inverter\_delay\_fo4\_20\_vdd\_vbb\_neg\_opt7.txt" ) VddVbbNegOptData<sub>7</sub> := READPRN("inverter\_delay\_fo4\_20\_vdd\_vbb\_neg\_opt8.txt" ) VddVbbNegOptData<sub>8</sub> := READPRN("inverter\_delay\_fo4\_20\_vdd\_vbb\_neg\_opt9.txt" ) VddVbbNegOptData<sub>8</sub> := READPRN("inverter\_delay\_fo4\_20\_vdd\_vbb\_neg\_opt9.txt" )

$$I_{ave\_inv20} \coloneqq \begin{cases} \text{for } n \in 0..9 \\ x_n \leftarrow \boxed{\left| \left( VddVbbNegOptData_n \right)^{\langle 4 \rangle} \right|} \\ x \cdot A \end{cases} \qquad \qquad t_{delay} \coloneqq \begin{cases} \text{for } n \in 0..9 \\ x_n \leftarrow \boxed{\left| \left( VddVbbNegOptData_n \right)^{\langle 9 \rangle} \right|} \\ x \cdot s \end{cases}$$

$$I_{ave\_nbulk} := \begin{bmatrix} \text{for } n \in 0..9 \\ x_n \leftarrow \boxed{\left| \left( VddVbbNegOptData_n \right)^{\langle 5 \rangle} \right|} \\ x \cdot A \end{bmatrix}} \quad I_{ave\_pbulk} := \begin{bmatrix} \text{for } n \in 0..9 \\ x_n \leftarrow \boxed{\left| \left( VddVbbNegOptData_n \right)^{\langle 6 \rangle} \right|} \\ x \cdot A \end{bmatrix}}$$

$$P_{supply} := \begin{cases} \text{for } n \in 0..9 \\ x_n \leftarrow \overline{\left(V_{supply_n} \cdot I_{ave_inv20_n}\right)} \\ x \end{cases} \qquad P_{nbulk} := \begin{cases} \text{for } n \in 0..9 \\ x_n \leftarrow \overline{\left|V_{substrate_n} \cdot I_{ave_nbulk_n}\right|} \\ x \end{cases}$$

$$P_{\text{pbulk}} \coloneqq \left| \begin{array}{c} \text{for } n \in 0..9 \\ x_n \leftarrow \overline{\left| \left( \left( V_{\text{supply}_n} - V_{\text{substrate}_n} \right) \right) \cdot I_{\text{ave_pbulk}_n} \right|} \\ x \end{array} \right|$$

$$P_{\text{total}} := \begin{cases} \text{for } n \in 0..9 \\ x_n \leftarrow (P_{\text{supply}_n} + P_{\text{nbulk}_n} + P_{\text{pbulk}_n}) \\ x \end{cases}$$



Note that as the substrate is forward biased, we can reduce the supply voltage. We expect the substrate current to increase with forward bias.

However, as seen from the plot, the power starts to increase as the S/D junction is biased in the forward active region.

This can be attributed to the fact that the substrate current is starts to dominate the overall power.

Thus we see that the minimum power occurs when the bulk is forward-biased at around 0.3 volts. This buys enough timing slack to reduce the supply voltage but still drawing a small enough current through the S/D junctions.





# Problem #2:

# Part a:

The two circuit styles compared in the paper are the standard CMOS inverter circuit and the Charge Recovery (or Adiabatic) logic. The delay and energy models for each can be expressed as:

$$D_{CMOS} = \frac{C \cdot V_{dd}}{I} = \frac{C \cdot V_{dd}}{k \cdot (V_{dd} - V_{th})^2} = R_{CMOS} \cdot C \qquad E_{CMOS} = \frac{1}{2} \cdot C \cdot V_{dd}^2$$
$$D_{CR} = \frac{\pi}{\omega_d} + R_{CR} \cdot C \qquad E_{CR} = \frac{1}{2} \cdot C \cdot \Delta V^2 \cdot \left[1 - e^{\left(\frac{-\pi \cdot \alpha}{\omega_d}\right)}\right]$$
$$\alpha = \frac{R_{CR}}{2 \cdot L}$$

Since both energy expressions are dependent on the square of the supply voltage (or in the case of CR, a fraction of the supply voltage), we can use voltage scaling to quadratically reduce the energy.

### Part b:

Without considering switching power, CMOS logic dissipates less energy until the voltage is scaled down to 1V. The point where CR starts to consume less energy has a 10X reduction in frequency and a 150X reduction in power.

With switching power considered, CMOS logic now dissipates less energy over a wider range. The voltage crossover point is reduced from 1V to voltages close to the transistor threshold voltage. Thus, with switching power considered, CR starts to consume less energy when the power is reduced 2500X at a frequency of 100X the peak.

### Part c:

For low power designs with increased computation requirements such as portable phones, computers, PDAs, and the like, CMOS logic will be the better choice.

However, for ultra-low power, very low performance requirements, such as watches, simple sensors without much computational requirements, medical and biological implants, and the like, CR logic might be the logic syle of choice.