University of California at Berkeley
College of Engineering
Department of Electrical Engineering and Computer Science

CS61C, Spring 2010

## HW 5 - Floating Point

[TA] Michael Greenbaum

Due Wednesday, March 3, 2010 @ 11:59pm
Updated Saturday, Feb. 27, 2010 @ 6:10pm

## Getting Started

Copy the contents of ~cs61c/hw/05 to a suitable location in your home directory.

`cp -r ~cs61c/hw/05 ~/hw5`

## Submission

Submit your solution by creating a directory named hw5 that contains the file fp.c. (Note that capitalization matters in file names; the submission program will not accept your submission if your file names differ at all from those specified.) From within that directory, type "submit hw5". This is not a partnership assignment; hand in your own work.

## Part 1: sprint_float

In this first part you will be writing the C function:

`void sprint_float (char *outbuf, uint64_t input, int num_bits, int exp_bits)`

The function should write to `outbuf` the string representation of `input`, interpreted as a float according to the parameters `num_bits` and `exp_bits`. `num_bits` refers to the total number of bits from `input` to be used for the float, starting from the least significant bit. `exp_bits` refers to the subset of these bits that should be used for the exponent field of the float. You can check that `num_bits - exp_bits - 1` bits are left for the significand. Since the size of the exponent field may change, we must also adjust the built-in bias. This generalizes to `-(2^(exp_bits-1)-1)`.

These parameters allow for a great deal of flexibility not only adjusting the expressive power of the float, but also the tradeoff between range and accuracy.

Example:

```num_bits = 10, exp_bits = 4
input: 0b----...--SEEEEMMMMM
- = Unused, S = Sign bit, E = Exponent, M = Mantissa (Significand)```

Put your code in the framework in fp.c. Test code is provided in test.fp.c.  You can compile and execute your code under the test bench by typing "`make fp`".

### Input specification:

Recall that uint64_t is the C99 64-bit int type.

You may assume that `outbuf` already has enough space allocated to hold your output. `num_bits` will be some value <=64, >=3. `exp_bits` will be some value >=1, <=`num_bits`-2. Note that this implies that we could potentially deal with a float 3 bits wide, with one bit for each of S, E, and M. Such a float would not have any room to hold regular (normalized) float values!

### Format:

Print it to `outbuf` according the format specified below:

• If the unbiased exponent (ie, after subtracting the bias) is less than -5 or greater than 8, print it as format: `[-]1.ddd...dE[-]xxx...x` where `1.ddd...d` is the mantissa represented in binary (note the leading 1), `E` stands for Exponent (where the right side is the power of 2 to multiply the mantissa by; recall scientific notation), and `xx` is the (unbiased) exponent in binary.  If the mantissa is 1.0, print out `1.0Exxxx...x`  Do not have any leading 0's on the exponent or trailing 0's on the mantissa.

• Otherwise, print the number as: `[-]mmmm.dddd`, where `mmmm` is the whole number part of the float and `dddd` is the fractional part.  Do not have trailing or leading zeroes. In other words, do not print any extra zeros to the left of the whole number part, or to the right of the fractional part. If the fractional part is 0, print the decimal point followed by a single 0 (ie 1101.0 and not 1101).  If the whole number part is 0 (the absolute value of the number is <1), you should start with '0.' (0.001101 and not .001101). See examples below.

• Handle zero, denorm, infinity, and NaN as shown below in the table.

• [-] means that you should place a minus sign into the buffer at that position if the number has the sign bit set, and you should not place a minus sign into the buffer if the sign bit is not set. The brackets are merely to indicate that the sign bit is conditional. No brackets should ever appear in your results.

The following table lists all the floating point possibilities ("maximal" refers to an exponent of all 1's):

 Object Represented What you must write into the buffer Exponent Mantissa 0 0 zero [-]0 0 nonzero ± denormalized number [-]denorm nonzero, non-maximal anything ± normalized number [-]mantissa_in_binaryE[-]exponent_in_binary or [-]whole_part_in_binary.fractional_part_in_binary according to rules above maximal 0 ± infinity [-]infinity maximal nonzero NaN (Not a Number) [-]NaN

Sample cases (make sure you understand each of these before you start coding!):

 Value in `input`, `num_bits` (b), `exp_bits` (e) What gets printed to buffer (`input` in hex or binary) 0x fff0, b=16, e=5, -NaN 0x 6, b=4, e=2 infinity 0x 0000 0001 0000 0000, b=64, e=30 denorm 0b 11 0011, b=6, e=3 -11.1 0x 3455 4340, b=32, e=8 (standard float) 1.10101010100001101E-10111 0b 111 0110 1010, b=11, e=4 -1101010.0

## Part 2: Float analysis

Now that we've familiarized ourselves with configurable float formats, we are ready to perform some basic analysis. In fp.c, complete the function:

`int can_represent (int num_bits, int exp_bits, int target)`

The function should return 1 if the float specified by `num_bits` and `exp_bits` can precisely represent the integer `target`, 0 otherwise.

Hint: Think about the binary representation of `target`

While we did not include test cases for this function in test.fp.c, it is strongly recommended that you add your own for debugging purposes.

### Input Specifications

`num_bits` and `exp_bits` will take on the same range of values as in part 1. `target` may be any signed integer.