**********************************************************************
DATA
d1	eat them here or there 
d2	eat them anywhere
d3	like them anywhere or here or there
**********************************************************************

**********************************************************************
Initial grammar:

r1	S -> eat them here or there 			d1
r2	S -> eat them anywhere				d2
r3	S -> like them anywhere or here or there	d3
**********************************************************************

**********************************************************************

ASSUME ALPHA is 5

MERGE 1

Current Grammar:
r1	S -> eat them here or there			d1
r2	S -> eat them anywhere				d2
r3	S -> like them anywhere	or here or there	d3

Candidate Merges:
m1	(r1,r2,prefix,1)
m2	(r1,r2,prefix,2)
m3	(r1,r3,suffix,1)
m4	(r1,r3,suffix,2)
m5	(r1,r3,suffix,3)

	[Note that candidate merges are briefly described in terms of
	the rules involved, the type of merge and a length of the common affix.]

Merge m1:
	* remove r1, r2: size -8, deriv -2
	* add r4, r5, r6: size +8, deriv +4
	r4	S -> eat X1				d1, d2
	r5	X1 -> them here or there		d1
	r6	X1 -> them anywhere			d2
	* cost: size -0, deriv +2; overall +2

Merge m2:
	* remove r1, r2: size -8, deriv -2
	* add r4, r5, r6: size +7, deriv +4
	r4	S -> eat them X1			d1, d2
	r5	X1 -> here or there			d1
	r6	X1 -> anywhere				d2
	* cost: size -1, deriv +2; overall -3

Merge m3: 
	* remove r1, r3: size -12, deriv -2
	* add r4, r5, r6: size +13, deriv +4
	r4	S -> eat them here or X2		d1
	r5	S -> like them anywhere or here or X2	d3
	r6	X2 -> there				d1, d3
	* cost: size +1, deriv +2; overall +7

Merge m4: (similar to m2 but better)
	* remove r1, r3: size -12, deriv -2
	* add r4, r5, r6: size +12, deriv +4
	r4	S -> eat them here X2			d1
	r5	S -> like them anywhere or here X2	d3
	r6	X2 -> or there				d1, d3
	* cost: size +0, deriv +2; overall +2

Merge m5: (similar to m2 and m3 but better)
	* remove r1, r3: size -12, deriv -2
	* add r4, r5, r6: size +11, deriv +4
	r4	S -> eat them X2			d1
	r5	S -> like them anywhere or X2		d3
	r6	X2 -> here or there			d1, d3
	* cost: size -1, deriv +2; overall -3

Merge m2 and Merge m5 are the best (least increase to overall cost).
We choose m2 for no particular reason.

**********************************************************************


**********************************************************************
MERGE 2

Current Grammar:
r1	S -> like them anywhere	or here or there	d3
r2	S -> eat them X1				d1, d2
r3	X1 -> here or there				d1
r4	X1 -> anywhere					d2

Candidate Merges:
m1	(r1,r3,suffix,1)
m2	(r1,r3,suffix,2)
m3	(r1,r3,suffix-whole,3)

Merge m1:
	* remove r1, r3: size -10, deriv -2
	* add r5, r6, r7: size +11, deriv +4
	r5	S -> like them anywhere or here or X2	d3
	r6	X1 -> here or X2			d1
	r7	X2 -> there				d1, d3
	* cost: size +1, deriv +2; overall +7

Merge m2: (similar to m1 but better)
	* remove r1, r3: size -10, deriv -2
	* add r5, r6, r7: size +10, deriv +4
	r5	S -> like them anywhere or here X2	d3
	r6	X1 -> here X2				d1
	r7	X2 -> or there				d1, d3
	* cost: size +0, deriv +2; overall +2

Merge m3: (similar to m1 and m2 but better)
	* remove r1, r3: size -10, deriv -2
	* add r5, r6, r7:size +9, deriv +4
	r5	S -> like them anywhere or X2		d3
	r6	X1 -> X2				d1
	r7 	X2 -> here or there			d1, d3
	* cost: size -1, deriv +2; overall -3

Merge m3 is the best.

Note: After performing m3, we have opened up the chance to 
further reduce the size of the grammar by merging nonterminal
X1 and X2 together.  In fact, every time we see a rule of the
form, X -> Y (where X and Y can be any nonterminals), we can
merge the nonterminals and still be able to derive the data.

The basic algorithm is simple.  Whenever the model merging 
algorithm sees a rule of the X -> Y form, just get rid of the
rule, and replace every instance of Y with X.

So the grammar we have before merging the two nonterminals was:

	r2	S -> eat them X1				d1, d2
	r4	X1 -> anywhere					d2
	r5	S -> like them anywhere or X2			d3
	r6	X1 -> X2					d1
	r7 	X2 -> here or there				d1, d3

The size of the grammar is 13 and the total derivation steps is 7

Now after merging X1 and X2 together, we get the following grammar
which can still derive all the data:

	r2	S -> eat them X1				d1, d2
	r4	X1 -> anywhere					d2
	r8	S -> like them anywhere or X1			d3
	r9 	X1 -> here or there				d1, d3

The size of the grammar is now 12 and the total derivation steps is 6.

But what is really cool, is that now, the grammar thinks that there is
something similar about the phrases "anywhere" and "here or there". In
fact, those two phrases are treated as the same type of thing because
they have the same lhs.  So now we can derive all the old data, but new
data as well: "eat them here or there" is a phrase that was not in the
data, but now can be generated.