Serialization Details

Project 3: Gitlet, your own version-control system

Serialization Details

If you think about Gitlet, you'll notice that you can only run one command every time you run the program. In order to successfully complete your version-control system, you'll need to remember the commit tree across commands. This means you'll have to design not just a set of classes to represent internal Gitlet structures during execution, but you'll need an analogous representation as files within your .gitlet directories, which will carry across multiple runs of your program.

As indicated earlier, the convenient way to do this is to serialize the runtime objects that you will need to store permanently in files. In Java, this simply involves implementing the java.io.Serializable interface:

import java.io.Serializable;

class MyObject implements Serializable {
    ...
}

This interface has no methods; it simply marks its subtypes for the benefit of some special Java classes for performing I/O on objects. For example,

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.ObjectOutputStream;
...
    MyObject obj = ....;
    File outFile = new File(someFileName);
    try {
        ObjectOutputStream out =
            new ObjectOutputStream(new FileOutputStream(outFile));
        out.writeObject(obj);
        out.close();
    } catch (IOException excp) {
        ...
    }

will convert obj to a stream of bytes and store it in the file whose name is stored in someFileName. The object may then be reconstructed with a code sequence such as

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
...
    MyObject obj;
    File inFile = new File(someFileName);
    try {
        ObjectInputStream inp =
            new ObjectInputStream(new FileInputStream(inFile));
        obj = (MyObject) inp.readObject();
        inp.close();
    } catch (IOException | ClassNotFoundException excp) {
        ...
        obj = null;
    }

The Java runtime does all the work of figuring out what fields need to be converted to bytes and how to do so.

There is, however, one annoying subtlety to watch out for: Java serialization follows pointers. That is, not only is the object you pass into writeObject serialized and written, but any object it points to as well. If your internal representation of commits, for example, represents the parent commits as pointers to other commit objects, then writing the head of a branch will write all the commits (and blobs) in the entire subgraph of commits into one file, which is generally not what you want. To avoid this, don't use Java pointers to refer to commits and blobs in your runtime objects, but instead use SHA-1 hash strings. Maintain a runtime map between these strings and the runtime objects they refer to. You create and fill in this map while Gitlet is running, but never read or write it to a file.

You might find it convenient to have (redundant) pointers commits as well as SHA-1 strings to avoid the bother and execution time required to look them up each time. You can store such pointers in your objects while still avoiding having them written out by declaring them "transient", as in

    private transient MyCommitType parent1;

Such fields will not be serialized, and when back in and deserialized, will be set to their default values (null for reference types). You must be careful when reading the objects that contain transient fields back in to set the transient fields to appropriate values.

Unfortunately, looking at the serialized files your program has produced with a text editor (for debugging purposes) would be rather unrevealing; the contents are encoded in Java's private serialization encoding. We have therefore provided a simple debugging utility program you might find useful: gitlet.DumpObj. See the Javadoc comment on gitlet/DumpObj.java for details.

Navigation

Serialization Details