"Frequently Asked Question" list for CS 61CL project 1

The relationship of labels, instructions, and comments

Q. Is it possible to have the following:

	loop: #this is a loop
	do something;

A. Yes. You may also have

	#this is a loop
	loop:
	do something;

and

	loop:
	#this is a loop
	do something;

Q. Must comments strictly follow instructions?

A. If a line contains both an instruction and a comment, the comment must follow the instruction. If a line contains both a label and an instruction, the instruction must follow the label. If a line contains both a label and a comment, the comment must follow the label. If a line contains all three, the comment must follow the instruction, which must follow the label.

Q. May I assume that an instruction always appears on the same line as or on the line immediately following a label?

A. No. Any number of comment or blank lines may separate a label from the instruction it labels.

Q. If I understand correctly, having more than one instruction per line separated with a semicolon, e.g.

	and destreg op1reg op2reg; or destreg op1reg op2reg;

is illegal? So we can assume that each line will contain at most one semicolon?

A. That's correct.

Q. Will every program start with the label "main:"?

A. No. The name "main" is totally arbitrary for the purposes of project 1. You might instead have

	count:
		instruction
	main:
		instruction

so that count would have value 0 and main value 2. The program might even start with an unlabeled instructions.

Labels and addresses

Q. How do we know which label is assigned to which address?

A. You have to keep track, as you read through the file, which address you're currently working on. The address of the first instruction is 0, the address of the second instruction is 2, and so on. The address that a label represents is the address of the first instruction after the label.

Q. Why is counter marked line 14 (28 in memory) when it's on line 15?

A. That's because main doesn't count as an instruction so you have to start counting from the second line, making it line 14. Only instructions increment the address.

Program organization

Q. Why does the assembler have to make two passes? Is there a way to simplify this into one pass?

A. It is helpful to make two passes, one that first parses all the instructions, building a skeleton of each, and finding all the label declarations and uses, then goes through and resolving all the labels to their values. I think this makes it simpler. I think it could be done in one pass but that is more complicated, since most compilers do two passes. Trying to do two things at once will lead to harder-to-find bugs and harder-to-debug code.

Q. Do you think it's all right to use the supplied homework solutions in the project?

A. Yes. That's why we're supplying them: to give you a model of reasonable code, and to save you some time.

Q. Would you be allowed to use someone else's code from homework 3 as long as they are okay with it and you document it?

A. No. The code in your project submission should be either yours or ours. Using the code of your classmates or previous CS 61C students violates the "no-code" rule (see the "General information" document) and will get you into big trouble.

Instruction translation

Q. I don't understand how a jump to the address 743A (0111 0100 0011 1010 in binary) is translated to machine language as FA1D.

A. First, the byte address 743A is converted to a word address by shifting it right one bit position. (As in MIPS, this conversion takes advantage of the fact that any instruction is aligned to a word address, so if the rightmost bit were kept, it would always be 0.) 0111 0100 0011 1010 shifted right one bit is 011 1010 0001 1101. The last 12 bits of the assembled jump instruction is then 1010 0001 1101; the first four bits is the op code for the jump, namely F.

Q. If an attempt is made to change the value of $0, should we ignore it or give out an error?

Ignore it. You don't need to do any error detection in this project.

Q. Will the C16 binary run only on the Sun machines?

A. The output of the assembler is two ASCII files, not "binary".

Q. The printf function prints hex in only lower-case letters, but the project requires upper case. Can we disregard that?

A. No, you may not disregard the requirement for upper-case hex digits. You may, however, use the format specification X instead of x to get your hex values printed in upper case. (See K&R, page 244.) Here's an example:

	unsigned short x = 0xABC1;
	printf ("%hX\n", x);

Q. Is CAL16 assembly language the same as MIPS?

A. There are numerous differences, including the following.

There are fewer instructions, and the word size is two bytes rather than four.
Some of the CAL16 instructions have different names than their MIPS counterparts.
In machine language, the branch argument is relative to the address of the branch instruction rather than the instruction immediately following.
The jr and llo and lhi instructions differ from their MIPS counterparts.
The .data instruction has a very different meaning from MIPS .data.

Q. Why is it that the operands in an instruction written in the CAL16 assembly language are listed in a different order in hex?

A. This will become clear when you actually design the CAL16 CPU later in the course.

Q. How do lhi and llo work?

A. There are two forms of the lhi and llo instructions. In one form, their second argument is a label. That label represents a two-byte address, so lhi is assembled into a "load immediate" machine instruction whose second operand is the first of those two bytes, and llo is assembled into a "load immediate" instruction whose second operand is the second of those two bytes. In the second form, the second argument is a two-byte number. That value is translated to hex, and then handled as if it were an address in the first form of those instructions.

Don't be too concerned right now about how these instructions work. Just translate them according to the specifications. Since they have the same opcode, the instructions can't differ in functionality. Both instructions load their lowest byte into the lowest byte in the destination register. In order to assemble the full 2-byte value, it's possible to subsequently rotate the register containing upper byte right by 8-bits to get it to the correct location and then load the lower byte into the same register, as demonstrated in the example program.

Q. What does .data do?

A. The .data instruction differs from the MIPS .data instruction. It doesn't say any thing about what segment the code should be put in; the CAL16 architecture only has one segment. The .data instruction merely gives you a way to put an arbitrary two-byte quantity into the next word. It says nothing about how that word will be treated when the program is run. Consider, for example, a program that starts with an add $5 $0 $0, and follows it immediately with the instruction .data 5297. The add would be the first instruction to be executed, and then the instruction whose decimal value is 5297 would be executed. (The latter turns out to be an or instruction.

Q. The spec says, "You may not assume any bound on the length of the program", but the C16 architecture uses only 16-bit addresses. If assembling a program goes past the end of the address space by producing more than 64KB of machine code, what should the value of subsequent labels be?

A. We stand corrected. You may assume that the assembled program is at most 64KB long.

Q. Are we not expected to handle jumps of more than 128?

A. The bneg and bz instructions are limited to jumping a distance of -128 to +127 words (-256 to +254 bytes). However, there is also a jmp instruction that has a larger range: its target can be any address with the same top 3 bits as the current PC (e.g., from address 0x7654 a jmp could go anywhere from 0x6000 to 0x7FFE). And you can jump anywhere in the address space by using a jr instruction.

The symbol table

Q. From the spec: "A reference to an undefined label should not be considered as an error in this context. " What should we do with the undefined label? Since we need to print out the relative address for any reference to the defined labels. Are we supposed to leave the line blank?

A. An undefined label should be treated the same as a defined label, except that its corresponding address is FFFF.

Q. Why do the bz and bneg instructions not contribute entries?

A. The symbol table is used by the linker to arrange multiple modules in memory. The linking process involves actually replacing the contents of some instructions; we'll explain how in a couple of weeks. Since the argument for a branch is a relative address, as opposed to a jump's argument, which is an absolute address, branches don't need to get fixed up during linking.

Q. The project description says, "The arguments to the bneg and bz instructions do not contribute entries of any kind to the .syms file." But what if I have something like "bz $4 done", and this is the only place where the symbol "done" is used. Does it contribute the entry "done n 1234" instead of "done n 1234 bz 0050", or does it contribute no entry at all?

A. If the symbol "done" is never defined anywhere in the program, the bz that uses it will not contribute any symbol table entries. (Actually that's a situation that you don't have to worry about.) If "done" is defined somewhere, the definition will appear in the table, but the use by the bz will not.

Q. The specification says (in the "Input and output format" section) that "any reference to an undefined symbol in ld, st, ... should be assembled into assembled into a sequence of 1's of the appropriate length." But none of the arguments to ld and st are symbols.

A. Correct. I fixed the problem specification.

Q. What's a good implementation for the symbol table?

A. You don't need to do anything fancy. A simple expandable array or linked list is fine.

Miscellaneous

Q. Will the project also be submitted on Expertiza or will it be submitted normally but "submit proj1" on instructional computers. Also, will there be an autograder for this project that we can use to test?

A. You will use the "submit" command on the instructional computers to submit your project 1. A test script will appear shortly.

Q. Should we write in C or assembly language?

A. C.

Q. What will this project teach us?

A. A C program gets compiled to assembly language, then assembled. The assembled version gets linked with other modules to produce an executable program. This project is an in-depth introduction to the assembly stage. Later in the course, you will write some of a CAL16 linker and encounter a CAL16 interpreter, as well as writing a MIPS interpreter for project 3.