Automated Testing P. N. Hilfinger 06 June 2009 THE BARE BONES --- ---- ----- In one sense, arranging for automated testing of student submissions for an assignment named ASSGN is easy: Create a makefile (for GNU make) called DIR/testing/mk/ASSGN.mk, whose default target performs all necessary building and testing of assignment ASSGN. Here, DIR is the name of the grading directory, typically $HOME/grading, that contains all the other grading-system- related files. The commands in this makefile may assume that the current directory contains the files submitted by the student. They makefile's set of commands should exit with status 0 if the student's submission checked out completely, and otherwise should exit with a non-zero status. The file ASSGN.mk may also contain targets OK and NOT-OK. When invoked with target OK, the makefile should simply print (on stdout) the message to prefix mail to the student about a successful test run. Likewise, the NOT-OK target should print the message to prefix mail about a not-entirely-successful test run. End of story. Given this setup, the command run-tests ASSGN will test all as-yet-untested submissions for assignment ASSGN and produce log files in DIR/logs/ASSGN/ok/SUBM or DIR/logs/ASSGN/failed/SUBM for each submission DIR/submissions/ASSGN/SUBM. The command mail-results ASSGN will then mail all as-yet-unmailed results to the indicated students, recording the mailing of results for submission SUBM of assignment ASSGN by creating a dummy file DIR/logs/ASSGN/mailed/SUBM. Simply by deleting DIR/logs/ASSGN/mailed/SUBM, you can cause the results to be re-mailed when you again issue mail-results. By deleting the 'ok' or 'failed' log file, you can get run-tests to re-run the tests. You may also use get-subm to retrieve and unpack a specific submission into the current directory and test-subm to test the unpacked submission in the current directory (without the creation of log files). These are good for checking your testing programs before using run-tests. GUIDANCE -------- Of course, although this is, in principle, the whole story, it leaves a great deal of innovation to the instructor. Just as an example, here's what I did for one assignment, hw1. You'll find these files in testing-sample/testing-files. First, an organizational note: in order that students could see testing material after the due date, I placed the makefile for ASSGN in ~/testing-files/ASSGN.mk, and placed testing files for ASSGN in directory ~/testing-files/ASSGN. There were symbolic links to these from DIR/testing/mk/ASSGN.mk. I changed the protections after each due date. In ~/testing-files, I created a common grading makefile, grading.mk, to be included by all others. This allowed me to keep the .mk files for each assignment simple. hw1.mk, for example, looks like this ASSIGN = hw1 all: 1.tests Progs.note-indentation include $(MASTERDIR)/grading/testing/mk/grading.mk export CLASSPATH := $(CLASSPATH):$(TEST_SRC)/classes Progs.class: Progs.java 1.tests: 1a.test 1b.test 1c.test 1d.test 1e.test 1f.test 1g.test 1a.test 1b.test 1c.test 1d.test 1e.test 1f.test 1g.test: Progs.class For this particular assignment, students were to turn in just a file Progs.java containing their solutions. The grading.mk file does several things: * Defines TEST_SRC to be $(MASTERDIR)/testing-files/$(ASSIGN) so that I can use it (as here) to inform testing software where it may find necessary supporting files for this assignment. * Gives an implicit rule for compiling .java files in .class files, so that when I make a test dependent on Progs.class, Progs.java gets compiled. * Defines an implicit rule for targets of the form X.test, so that building X.test runs the command $(TEST_SRC)/X.cmd (e.g., $(MASTERDIR)/testing-files/hw1/1a.cmd), which must be an executable file or shell script. This implicit rule also sets up a few environment variables that can be used by X.cmd: * TEST_SRC: as above * TEST: the name of this test (e.g., 1a) If the X.cmd program succeeds (exit code 0), and if another program $(TEST_SRC)/X.chk is defined, the grading.mk's commands run this program, passing it the environment variable * OUTPUT: name of the output file produced by X.cmd in addition to the two above. The exit code from X.chk indicates whether the output checks out (0 means OK). * Defines targets OK and NOT-OK to print files SUCCEED_MESSAGE and FAIL_MESSAGE respectively. If these files are not defined in the TEST_SRC directory, the software substitutes generic messages. For example, problem 1a of hw1 required the student to write a static Java function Progs.factorSum, such that Progs.factorSum(N) gives the sum of all integers in [1..N) that evenly divide N. I wrote a test driver, ProgDriver.java, such that java ProgDriver a N0 N1 ... would print out factorSum(N0) = S0 factorSum(N1) = S1 etc., where the Si are the results returned by Progs.factorSum(Ni). I compiled ProgDriver.java into $(MASTERDIR)/testing-files/hw1/classes/ProgDriver.class The file $(MASTERDIR)/testing-files/hw1/Makefile contains the commands necessary to perform this compilation (it also creates files classes/{Paragraph.class,Sentence.class} which were provided with the homework assignment to be used by the student). The file 1a.cmd is a shell script containing #!/bin/sh CLASSPATH=${MASTERDIR}/testing-files/hw1/classes:${CLASSPATH} export CLASSPATH ulimit -f 10 ulimit -t 10 ulimit -d 10000 java ProgDriver a 0 1 2>&1 &1 /dev/null; then exit 0; else echo "Your results don't match ours. We were expecting:" echo "==================================================" cat $TEST_SRC/$TEST.std echo "==================================================" echo "But instead, we saw:" echo "==================================================" cat $OUTPUT echo "==================================================" exit 1 fi which compares the student's output ($OUTPUT) against a file of expected results, .../hw1/1a.std, containing factorSum(0) = 0 factorSum(1) = 0 factorSum(2) = 1 factorSum(3) = 1 factorSum(4) = 3 factorSum(5) = 1 factorSum(6) = 6 factorSum(7) = 1 factorSum(12) = 16 factorSum(20) = 22 factorSum(28) = 28 factorSum(42) = 54 The .chk file is written to be useful for any test where checking just requires comparison with a standard file. CUSTOMIZATION ------------- If you would prefer to roll your own testing script rather than the default use of make, you can do sothrough use of the TEST_COMMAND and TEST_COMMAND_SUMMARY parameters in the 'params' file. By default TEST_COMMAND is set to "$GMAKE -k -f $TEST_MAKEFILE_DIR/$assgn.mk" To test a submission of assignment ASSGN, the software executes this string as a Unix command in a directory containing a copy of the submission, after substituting ASSGN for $assgn (also for parameters like $GMAKE (default gmake) and $TEST_MAKEFILE_DIR (default $GRADING_DIR/testing/mk)). By modifying this parameter, you can therefore substitute any command you want, passing it the assignment name. The parameter TEST_COMMAND_SUMMARY controls whether to add a summary message to each submission-grading log, or instead to leave any such message to TEST_COMMAND itself. If its value is 'default', the software will add a trailing "<< All tests passed", or "PROBLEM:..." message as appropriate. Otherwise, it will add nothing. GRADING IN PARALLEL AND REGRADING ------- -- -------- --- --------- Especially with large classes in which a significant fraction of the class can be expected to submit at the last minute, you will probably want to test as many projects simultaneously as will do any good (say, as many projects as there are CPUs on your system). The run-tests program does the locking necessary to avoid interference when several instantiations of it run simultaneously. When machines (or programs) crash, however, they may leave things in a locked state, interfering with later runs. First, you will get warnings when lock files are left behind; they eventually expire, at which point you can use the -f option to run-tests to force the locks. Second, run-tests will skip submissions for which there are existing log files DIR/logs/ASSGN/ok/SUBM or DIR/logs/ASSGN/failed/SUBM (see above). Simply deleting these files will cause run-tests to test the corresponding submission again.