The Elements of Logic Design Style

The Elements of Logic Design Style
March 2001
Filed at: http://www.cs.wisc.edu/~markhill/kong
Appendix A: http://www.cs.wisc.edu/~markhill/kong/appendixA.html
Appendix B: http://www.cs.wisc.edu/~markhill/kong/appendixB.html
Appendix C: http://www.cs.wisc.edu/~markhill/kong/appendixC.html
$Source: /proj/gemini/cvs_root/P2002/Notes/Style/main_text,v $
$Date: 2001/12/06 21:49:07 $
$Revision: 1.1 $
$Id: main_text,v 1.1 2001/12/06 21:49:07 kong Exp $
A copy of this file is kept at: /home/kong/P2002/Notes/Style/main_text


1. Introduction
-----------------------------------------------------------------------------
The goal of this document is to summarize some ideas I find useful in logic
design and Verilog coding (Note 1).  Logic design is not the same as Verilog
coding.  One common mistake of some inexperience logic designers is to treat
logic design as a Verilog programming task.  This often results in Verilog
code that is hard to understand, hard to implement, and hard to debug.

Logic design is a process:
  1. Understand the problem.
  2. If necessary, divide the problem into multiple modules with
     clean and well defined interfaces.
  3. For each module:
      a. Design the datapath that can process the data for that module.
      b. Design the controller to control the datapath and produce
         control outputs (if any) to other adjacent modules.

Verilog coding, on the other hand, is a modeling task.  More specifically,
after one has done some preliminary designs on the datapaths and controllers,
Verilog code is then used to:
  1. Model the datapaths and the controllers.
  2. Connect the datapath and controller together to form modules.
  3. Connect the modules together to form the final design.

Note 1: Verilog is used as an example in this document.  The ideas
    discussed in this document, however, should also applicable to other
    Hardware Description Language (such as VHDL) with minor adjustments.

The rest of this document is organized as follows:

    Section 2	discusses the most important rule of logic design: keep
		it easy to understand.  This section also introduces some
		basic Verilog coding guidelines.

    Section 3   discusses the art of dividing a design into high-level
 		modules and then how these modules can be divided into
		datapaths and controllers.

    Section 4	discusses the logic design and Verilog coding guidelines
		for the datapath.

    Section 5	discusses the logic design and Verilog coding guidelines
		for the controller.

    Section 6	discusses some miscellaneous Verilog coding guidelines.

    Section 7	is a summary of all the logic design and Verilog coding
		guidelines introduced in this document.  This summary serves
		as a quick reference for readers who either: (a) may not have
		the time to read this entire document, or (b) have already
		read this document once but want a quick reminder later on.

Throughout this document, I have listed many Verilog files from my home
directory (/home/kong/P2001/... ) as examples.  They are listed here as
references only. Readers do not need to read them to understand the key
points of this document because I have already included the Verilog code
I want to use as examples throughout this document.  Furthermore, the
example Verilog files that model a module, a datapath, and a controller are
included in Appendix A, Appendix B, and Appendix C for those readers who
are interested in looking at the structure of a complete Verilog file.

2. The Most Important Rule of Logic Design & Basic Verilog Coding Guidelines
-----------------------------------------------------------------------------
The most important logic design rule is more a philosophy than a rule :-)

    *** Logic Design Guideline 2-1 (MOST IMPORTANT) ***
    The design MUST be as simple as possible and easy to understand!

If a design is hard to understand, then nobody will be able to help the
original designer with his or her work.  Also as time passes, the hard to
understand design will become impossible to maintain and debug even for
the original designer.  Therefore, a logic designer must keep his or her
design simple and easy to understand even if that means the design is slightly
bigger or slightly slower as long as the design is still small enough and
fast enough to meet the specification.

One important step in keeping a design simple and the Verilog code that
models the design easy to understand is to use standard logic elements such
as register, multiplexer, decoder, ... etc.  Consequently, the first step
in any Verilog coding project is:

    *** Verilog Coding Guideline 2-1 ***
    Model all the standard logic elements in a library file to be SHARED
    by ALL engineers in the design team.

For an example of such a library, see:
    /home/kong/P2001/Verilog/CommonFiles/sata_library.v

    *** For those readers who do not have access to my home directory, ***
    *** don't worry.  I will include the important Verilog code I want ***
    *** to use as examples throughout this document.  Furthermore, in  ***
    *** Appendix A, Appendix B, and Appendix C are examples of Verilog ***
    *** files that model a module, a datapath and a controller.        ***

Below are some examples of the basic logic elements defined in sata_library.v

	/***************************************************************
	 * Simple N-bit register with a 1 time-unit clock-to-q time
	 ***************************************************************/
	module v_reg( q, c, d );
	    parameter   n = 1;

	    output  [n-1:0] q;
	    input   [n-1:0] d;
	    input           c;

	    reg     [n-1:0] state;

	    assign  #(1) q = state;

	    always @(posedge c) begin
		state  = d;
	    end

	endmodule // v_reg

	/***************************************************************
	 * Simple N-bit latch with a 1 time-unit clock-to-q time
	 ***************************************************************/
	module v_latch ( q, c, d );
	    parameter   n = 1;

	    output  [n-1:0] q;
	    input   [n-1:0] d;
	    input           c;

	    reg     [n-1:0] state;

	    assign  #(1) q = state;

	    always @(c or d) begin
		if (c) begin
		    state  = d;
		end
	    end

	endmodule // v_latch

	/***************************************************************
	 * Simple N-bit 2-to-1 Multiplexer
	 ***************************************************************/
	module v_mux2e( z, s, a, b );
	    parameter n = 1;

	    output    [n-1:0] z;
	    input     [n-1:0] a, b;
	    input             s;

            assign  z =  s ? b : a ;	// s=1, z<-b; s=0, z<-a

	endmodule

One key observation from the logic elements defined in sata_library.v is:

    *** Verilog Coding Guideline 2-2 ***
    Only the storage elements (examples: register and latch) have non-zero
    clock-to-q time.  All combinational logic (example: mux) has zero delay.

The non-zero clock-to-q time of the storage elements will prevent hold time
problems at all registers' inputs.  In general, a logic designer must NOT
rely on a combinational logic block to have a certain minimum delay.  The
zero delay in the verilog model of the combinational logic elements will
ensure logic designer does not rely on any minimum delay during simulation.

Once the basic logic elements have been modeled in the library file:

    *** Verilog Coding Guideline 2-3 ***
    Use explicit register and latch (example: v_reg and v_latch as
    defined in the examples above ) in your verilog coding.  Do not rely
    on logic synthesis tools to generate latches or registers for you.

By making the logic designer explicitly place the registers and/or latches,
the logic designer is forced to consider timing implication of their logic
early in the design cycle.   In other words, the designer is forced to ask
himself or herself questions such as: am I having too much logic in between
registers so that it may not meet the cycle time?  Also with explicit
registers and latches in the the Verilog code, it will be much easier for
those who read the code to draw a simple block diagrams showing all the
registers in the design.  Such a block diagram (see Section 4 and Section 5)
is very useful in terms of understanding the design (remember the MOST
important Logic Design Guideline above: the design must be easy to understand)
as well as making timing tradeoffs when such tradeoffs are necessary.

At first glance, it seems ironic that the logic designer needs to always
keep in mind how much combinational logic exists between any two storage
elements (registers or latches) while in Verilog coding (see Verilog Coding
Guideline 2-2), we want to treat all combinational logic to have zero delay.
The reason for this apparent contradiction is that in logic design, the delay
of the combinational logic between storage elements determines the cycle time.
Consequently, it is important for the logic designer to be aware of the
complexity of the logic between two storage components at all time.  On the
other hand, in order to reduce potential hold time problems, we also do not
want the correct operation of the logic to depend on the logic having a
certain minimum delay.  The best way to make sure the logic can operate
correctly without relying on the combinational logic blocks to have certain
minimum delay is to run the Verilog simulation with all combinational logic
blocks having zero delay and rely on the storage elements' (registers and/or
latches) non-zero clock-to-q time to satisfy the hold time requirement of
the next register.

3. Hierarchal Design and Clock Domain Consideration
-----------------------------------------------------------------------------
Another important step in keeping a design simple and the Verilog code that
models the design easy to understand is to adopt a hierarchal approach to
the design process and then make the Verilog code follows the same hierarchy.

Hierarchal design, however, should not be carry to an extreme.  For example,
as pointed out by one of my colleagues Kyutaeg Oh [1], too deep an hierarchy
can cause too many module instantiations, which will cause synthesis to run
too slowly.  Below is an hierarchal design strategy I find useful.

    *** Logic Design Guideline 3-1 ***
    Use an hierarchal strategy that breaks the design into modules
    that consists of datapaths and controllers.  More specifically:

    1. Divide the problem into multiple modules with clean and well
       defined interface.

    2. For each module:
        a. Design the datapath that can process the data for that module.
        b. Design the controller to control the datapath and produce control
           outputs (if any) to other adjacent modules.

One example for such an hierarchal approach can be found in the Serial ATA
to Parallel ATA Converter for the Disk (Device Dongle).  And as shown in
Figure 3-1, the Device Dongle are divided into three modules:

 1. The Parallel ATA Interface to the disk: ATAIF.  See Reference [2].
 2. The Transport Layer: Transport.  See Reference [3].
 3. The Link Layer: Link.  See Reference [4].

            +-----------+   +-------------+   +--------+
            |           |   |             |   |        |
   /-------\| Parallel  |--->  Transport  |--->  Link  +--> To Serializer
  < ATA Bus >   ATA     |   |    Layer    |   |  Layer |
   \-------/| Interface |<--+             |<--+        |
            |  (ATAIF)  |   | (Transport) |   | (Link) <--- From Deserializer
            |           |   |             |   |        |
            +-----------+   +-------------+   +--------+

           Figure 3-1: The Three Modules that Form the Device Dongle

The Parallel ATA Interace (ATAIF), the Transport Layer (Transport) and the
Link Layer (Link) shown in Figure 3-1 are further divided into datapath
and controller modules as described below and shown in Figure 3-2.

                          +----------------------+  +----------------------+
                          |   Transport Layer    |  |      Link Layer      |
                          |        dtrans        |  |         link         |
                          | +------------------+ |  | +------------------+ |
                          | | Transmit Engine  | |  | | Transmit Engine  | |
                          | |    dtrans_tx     | |  | |     link_tx      | |
                          | | +--------------+ | |  | | +--------------+ | |
                          | | |   Datapath   | | |  | | |   Datapath   | | |
                          | | | dtrans_txdp  | | |  | | |  link_txdp   | | |
   +------------------+   | | +--------------+ | |  | | +--------------+ | |
   |   Parallel ATA   |   | |                  | |  | |                  | |
   |    Interface     |   | | +--------------+ | |  | | +--------------+ | |
   |      dataif      |   | | |  Controller  | | |  | | |  Controller  | | |
   |  +-----------+   |   | | | dtrans_txctl | | |  | | |  link_txctl  | | |
   |  | Datapath  |   |   | | +--------------+ | |  | | +--------------+ | |
   |  |           |   |   | |                  | |  | |                  | |
   |  | dataif_dp |   |   | | +--------------+ | |  | | +--------------+ | |
   |  +-----------+   |   | | | Synchronizer | | |  | | | Synchronizer | | |
   |                  |   | | | dtrans_txsyn | | |  | | |  link_txsyn  | | |
   |                  |   | | +------------^-+ | |  | | +------------^-+ | |
   |  +------------+  |   | +---+----------|---+ |  | +---+----------|---+ |
   |  | Controller |  |   |     |(3)       |(3)  |  |     |(1)       |(2)  |
   |  |            |  |   | +---|----------+---+ |  | +---|----------+---+ |
   |  | dataif_ctl |  |   | | +-v------------+ | |  | | +-v------------+ | |
   |  +------------+  |(4)| | | Synchronizer | | |  | | | Synchronizer | | |
   |                  +------->              | | |  | | |              | | |
   |                  |   | | | dtrans_rxsyn | | |  | | |  link_rxsyn  | | |
   | +--------------+ |   | | +--------------+ | |  | | +--------------+ | |
   | | Synchronizer | |(5)| | +--------------+ | |  | | +--------------+ | |
   | |              <-----+ | |  Controller  | | |  | | |  Controller  | | |
   | |  dataif_syn  | |   | | | dtrans_rxctl | | |  | | |  link_rxctl  | | |
   | +--------------+ |   | | +--------------+ | |  | | +--------------+ | |
   +------------------+   | |                  | |  | |                  | |
                          | | +--------------+ | |  | | +--------------+ | |
                          | | |   Datapath   | | |  | | |   Datapath   | | |
                          | | | dtrans_rxdp  | | |  | | |  link_rxdp   | | |
                          | | +--------------+ | |  | | +--------------+ | |
                          | |  Receive Engine  | |  | |  Receive Engine  | |
                          | |    dtrans_rx     | |  | |    dtrans_rx     | |
                          | +------------------+ |  | +------------------+ |
                          +----------------------+  +----------------------+

          Figure 3-2: Further Divisions of the Device Dongle Modules
 
The Parallel ATA Interface, modeled by the "module dataif" in the Verilog
file dataif.v (see Reference [2]), is further divided into the followings
(see Reference [5]):

    Datapath:     module dataif_dp in the Verilog file dataif_dp.v
    Controller:   module dataif_ctl in the Verilog file dataif_ctl.v
    Synchronizer: module dataif_syn in the Verilog file dataif.v

The Transport Layer, modeled by the "module dtrans" in the Verilog file
dtrans.v (see Reference [3]), is further divided into the followings
(see Reference [6]):

    Transmit Engine: module dtrans_tx in the Verilog file dtrans_tx.v
    This Transport Layer Transmit Engine is further divided into:

	Datapath:     module dtrans_txdp in the Verilog file dtrans_txdp.v
	Controller:   module dtrans_txctl in the Verilog file dtrans_txctl.v
        Synchronizer: module dtrans_txsyn in the Verilog file dtrans_tx.v

    Receive Engine: module dtrans_rx in the Verilog file dtrans_rx.v
    This Transport Layer Receive Engine is further divided into:

	Datapath:     module dtrans_rxdp in the Verilog file dtrans_rxdp.v
	Controller:   module dtrans_rxctl in the Verilog file dtrans_rxctl.v
	Synchronizer: module dtrans_rxsyn in the Verilog file dtrans_rx.v

Similarly the Link Layer, modeled by the "module link" in the Verilog file
link.v (see Reference [4]), is further divided into the followings
(see Reference [7]):

    Transmit Engine: module link_tx in the Verilog file link_tx.v
    This Link Layer Transmit Engine is further divided into:

	Datapath:     module link_txdp in the Verilog file link_txdp.v
	Controller:   module link_txctl in the Verilog file link_txctl.v
	Synchronizer: module link_txsyn in the Verilog file link_tx.v

    Receive Engine: module link_rx in the Verilog file link_rx.v
    This Link Layer Receive Engine is further divided into:

	Datapath:     module link_rxdp in the Verilog file link_rxdp.v
	Controller:   module link_rxctl in the Verilog file link_rxctl.v
	Synchronizer: module link_rxsyn in the Verilog file link_rxsyn.v

For those readers who have accessed to my home directory and are interested
in taking a closer look at the Verilog files discussed above, please refer
to References [2 to 7].  However, the detail contents of these Verilog
files are not needed to illustrate the following Verilog Coding Guideline:

    *** Verilog Coding Guideline 3-1 ***
    A separate Verilog file is assigned to the Verilog code for:
     1. Each datapath.  Example: dtrans_txdp.v
     2. Each controller.  Example: dtrans_txctl.v
     3. As well as the Verilog code for each high level module,
	that is a module at a hierarchy level higher than the datapath
	and the controller.  Examples: link_tx.v, link_rx.v, and link.v

A corollary of the above Verilog Coding Guideline is as follows:

    *** Verilog Coding Guideline 3-2 ***
    In order to keep the number of Verilog files under control, one should
    try not to assign a separate Verilog file to any low level module that
    is at a hierarchy level lower than the datapath and the controller.

For example as I will show you in Section 4, the datapath will contain many
datapath elements.  Instead of assigning a separate Verilog file for each of
these datapath elements, the datapath elements are all grouped into a single
"library" file (link_library.v).  Similarly, as I will show you in Section 5,
the controller will contain a "Next State Logic" and an "Output Logic" blocks.
Instead of assigning a separate Verilog file for each logic block, the logic
blocks will be included in the Verilog file assigned to the controller.

Enclosed in Appendix A are the Verilog files dtrans.v, dtrans_tx.v, and
dtrans_rx.v.  Here is something worth noticing:

    *** Verilog Coding Guideline 3-3 ***
    The Verilog code for the high level module, that is module at a
    hierarchy level higher than the datapath and the controller (examples:
    module dtrans_tx, module dtrans_rx, and module dtrans) should not
    contains any logic.  It should only shows how the lower level modules
    are connected.

For example, if you look at the dtrans.v file in Appendix A, the "module
dtrans" only shows how its transmit engine (dtrans_tx) and its receive
engine (dtrans_rx) are connected.  Similarly, if you look at the dtrans_tx.v
file in Appendix A, the "module dtrans_tx" contains only the information on
how its datapath (dtrans_txdp), its controller (dtrans_txctl), and its
synchronizer (dtrans_txsyn) are connected together.  In any case, neither
the "module dtrans," the "module dtrans_tx," nor the "module dtrans_rx"
contain any Verilog code that models raw logic.

Notice from Figure 3-2 that the ATA Interface module is divided into the
datapath and the controller.  On the other hand, the Transport Layer and the
Link Layer are first partitioned into the Transmit Engine and the Receive
Engine before further divided into controller and datapath.  The reason for
this extra level of hierarchy for the Transport Layer and the Link Layer is
because their Transmit Engines and their Receive Engines work in different
clock domains.  More specifically, the ATA Interface, the Transmit Engine
of the Link Layer, and the Transmit Engine of the Transport Layer all 
operates under the same clock, the transmit clock while the Receive Engines
of the Link and Transport Layers both operates on a different clock,
the receive clock. This leads to the following design guidelines:

    *** Logic Design Guideline 3-2 ***
    Keep different clock domains separate and have an explicit
    synchronization module for signals that cross the clock domain.

For example, please refer to the places in Figure 3-2 labeled with numbers
in parentheses as you read the numbered paragraph below:
 1. All signals going from the Link Layer's Transmit Engine to its
    Receive Engine must go through synchronization via the module
    "link_rxsyn" before the signals can be used by the Receive Engine.

 2. Similarly, all signals going from the Link Layer's Receive Engine to
    its Transmit Engine must go through synchronization via the module
    "link_txsyn" before the signals can be used by the Transmit Engine.

 3. The discussion in Paragraph 1 and Paragraph 2 above also applies to
    the signals between the Transmit Engine and the Receive Engine of the
    Transport Layer.

 4. Since the Parallel ATA Interface and the Receive Engine of the Transport
    Layer operate on different clock domain, all signals going from the
    Parallel ATA Interface to the Transport Layer's Receive Engine must go
    through synchronization via the module "dtrans_rxsyn" before the signals
    can be used by that Receive Engine.

 5. Similarly, all signals going from the Transport Layer's Receive Engine
    to the Parallel ATA Interface must go through the synchronization module
    "dataif_syn" before the signals can be used by the ATA Interface.

4. Datapath Design
-----------------------------------------------------------------------------
Figure 4-1 is an example of a generalized datapath and the next paragraph
describes some important observations from this figure.

              |<--- Control Inputs from the Controller (2) -->|
              |   |           |              |   |        |   |
              |...| (3a)      |              |   |  ...   |   |
            +-v---v--+        |Select        | +-v--------v-+ |
  Input  N  |  See   | N      |              | | Simple (4) | |
    A ---/--> Figure +-/-+  + |    (5)       | |Random Logic| |
   (1)      |  4-2   |   |  |\v    (3d)      | +-+--------+-+ |
            +-+---+--+   |  | \    +---+     |...|        |...|  (3b)
              |...|      +-->0 +   |   |   +-v---v--+   +-v---v-+-+
              v   v         |  | N | R | N |  See   | N |  See  | |  N  Output
                            |  +-/-> E +-/-> Figure +-/->Figure | +--/--> Q
              |...| (3a)    |  | ^ | G | ^ |  4-2   |   |  4-3  | |      (1)
            +-v---v--+   +-->1 + | |   | | +-+---+--+   +-+---+-+^+
  Input  N  |  See   | N |  | /  | +-^-+ |   |...|        |...|  | (5)
    B ---/--> Figure +-/-+  |/   |   |   |   v   v        v   v CLK
   (1)      |  4-2   |      +    |  CLK  |                    |
            +-+---+--+    (3c)   |       | (1)                |
              |...|              K (1)   Y Internal Signals   |
              v   v                                           |
              |<--- Control Outputs to the Controller (2) --->|

               Figure 4-1: Block Diagram of the General Datapath

When you read the numbered paragraphs below, please refer to the places in
Figure 4-1 labeled with the same numbers in parentheses:
 1. This simple N-bit datapath has two N-bit data inputs (A and B) and one
    N-bit data output Q.  The internal signals K and Y are marked here to
    facilitate the discussion of the pipeline register in Paragraph 5 below.

 2. Other than the N-bit data inputs and outputs discussed in Item 1,
    a generalized datapath should also have Control Inputs from the
    controller and Control Outputs to the controller (see Section 5).

 3. In general, a datapath consists of the following components:
     a. Combinational Datapath Elements shown in Figure 4-2 where the N-bit
        Data Output and the Control Outputs depend ONLY on the current values
        of the N-bit Data Input and Control Inputs.  Examples of Combinational
        Datapath Elements are the multiplexer and the ALU.

     b. Sequential Datapath Elements shown in Figure 4-3 where the N-bit
        Data Output can depend on the current N-bit Data Input, the current
        Control Input, as well as the previous cycle's N-bit Data output.
	An 8-bit counter is an example of a Sequential Datapath Element.

     c. Multiplexers, which is just a special case of the Combinational
        Datapath Elements shown in Figure 4-2.

     d. Registers or Register File, which can be consider as a special
        case of the Sequential Datapath Elements shown in Figure 4-3.

 4. The "Simple Random Logic" here are commonly referred to as "glue logic"
    which consists of simple inverters, AND gates, and OR gates.  In theory,
    all these "glue logic" can be integrated into the controller that is
    is discussed in Section 5.  In practice, however, it is sometimes
    simplier to just use some "glue" logic in the datapath.

 5. The register described in Item 3d as well as the implicit register at
    the output of the Sequential Datapath Element (Item 3b) are commonly
    referred to as the pipeline register.

                                Control Inputs
                                   | |...| |
                                   | |   | |
                               +---v-v---v-v---+
                               |               |
                        N      | Combinational |      N
                   -----/------>   Datapath    +------/----->
                      N-bit    |   Elements    |    N-bit
                   Data Input  |               | Data Output 
                               +---+-+---+-+---+
                                   | |...| |
                                   | |   | |
                                   v v   v v
                                Control Outputs 

                Figure 4-2: A Combinational Datapath Elements

                               Control Inputs
                                 | |...| |
                                 | |   | | 
                             +---------------------+
                             |   | |   | |         |
                             | +-v-v---v-v---+---+ |
                             | |             |   | |
                             +->             |   +-+
                        N      | Sequential  | R |      N
                   -----/------>  Datapath   | E +------/----->
                      N-bit    |  Elements   | G |    N-bit
                   Data Input  |     +-----+ |   | Data Output
                               |     | REG < |   |
                               +-+-+-+-+-+-+-+-^-+
                                 | |...| |     |
                                 | |   | |    CLK
                                 v v   v v
                               Control Outputs

   Figure 4-3: A Sequential Datapath Elements with Register at its Outputs

The main function of the explicit pipeline register shown in Figure 4-1's
Item 3d and Item 5 is to limit the datapath's critical path delay to a value
less than the desired cycle time of the system.  The effect of such pipeline
register can be best understood with a timing diagram.

    *** Logic Design Guideline 4-1 ***
    The best way to study the effect of the datapath's pipeline registers is
    to draw a timing diagram showing each register's effect on its outputs
    with respect to rising or falling edge of the register's input clock.

Figure 4-4 below is an example of such a timing diagram for the generalized
datapath example shown in Figure 4-1.  In this timing diagram example (when
you read the numbered paragraphs below, please refer to the places in Figure
4-4 labeled with the same numbers in parentheses):

 1. The N-bit Input A and Input B settle to their known values "A" and "B"
    sometimes after the rising edge of Cycle 2.

    For the sake of simplicity, let's assume all the Control Inputs (likely
    generated by a controller similar to the one described in Section 5) of
    this datapath are stable prior to the rising edge of the Cycle 2 so that
    they are not factors in the critical delay path considerations.  In actual
    design, such assumptions will be verified by static timing analysis.

 2. Due to the assumption of the Control Inputs listed in Item 1, we only
    need to make sure Input A and Input B settle early enough to allow the
    two Combinational Datapath Elements (Item 3a in Figure 5-1) and the
    multiplexer (Item 3c in Figure 5-1) to produce the Internal Signal K
    at least one set-up time prior to the rising edge of Cycle 3.

 3. If the condition listed in Item 2 is met, the pipeline register can then
    capture the value of Internal Signal K and set the Internal Signal Y to
    the value "Y" one clock-to-q time after the rising edge of Cycle 3.

 4. Once again due to the assumption of the Control Inputs listed in Item 1,
    then as long as the Combinational Datapath Element after the pipeline
    register (Item 3d in Figure 4-1) together with the combinational logic
    within the Sequential Datapath Element (Item 3b in Figure 4-1) can
    produce the result for the Sequential Datapath Element's "implicit"
    register at least one set-up time prior to the rising edge of Cycle 4,
    then the Output of this datapath will be set to the stable value "Q"
    one clock-to-q time after the rising edge of Cycle 4.

             |    1    |    2    |    3    |    4    |    5    |    6    |
             |         |         |         |         |         |         |
             +----+    +----+    +----+    +----+    +----+    +----+    +-
   Clock     |    |    |    |    |    |    |    |    |    |    |    |    |
   ----------+    +----+    +----+    +----+    +----+    +----+    +----+ 
             |         |         |         |         |         |         |
   ---------------------+ +-------+ +--------------------------------------
   Input A ///////////// X    A    X //////////////////////////////////////
   ---------------------+ +-------+ +--------------------------------------
             |         | (1)     |         |         |         |         |
   ---------------------+ +-------+ +--------------------------------------
   Input B ///////////// X    B    X //////////////////////////////////////
   ---------------------+ +-------+ +--------------------------------------
             |         |     (2) |         |         |         |         |
   ---------------------------+ +-------+ +--------------------------------
   Internal Signal K ///////// X    Y    X ////////////////////////////////
   ---------------------------+ +-------+ +--------------------------------
             |         |         |  (3)    |         |         |         |
   -------------------------------+ +-------+ +----------------------------
   Internal Signal Y ///////////// X    Y    X ////////////////////////////
   -------------------------------+ +-------+ +----------------------------
             |         |         |         |  (4)    |         |         |
   -----------------------------------------+ +-------+ +------------------
   Output Q //////////////////////////////// X    Q    X //////////////////
   -----------------------------------------+ +-------+ +------------------

       Figure 4-4: A Timing Diagram of the Datapath's Pipeline Register

Item 4 above brings up an interesting observation of the Sequential Datapath
Element shown in Figure 4-3 where the implicit register of this datapath
element is shown to be on the output side of the element.  The placement of
the register on the output side (versus the input side) in the drawing is
intentional.  It reflects the actual placement of the register in hardware.
I like to place such a register at the output (versus input) so that all
N-bit of the output will be stable at the same time at one clock-to-q time
after each rising edge of the clock.  Also shown in Figure 4-3 is that some
Control Outputs of the Sequential Datapath Element can also be registered.
This, however, is not as common as having the Control Outputs to be strictly
combinational and allows the user of these signals (likely to be the
controller, see Section 5) the flexibility of using these values one cycle
earlier if the critical timing is not violated.

The above discussion of the timing diagram in Figure 4-4 illustrates that the
logic designer cannot draw an accurate timing diagram unless he or she knows
the exact location of the registers relative to the combinational logic.
This brings us a corollary of the Logic Design Guideline 4-1:

    *** Logic Design Guideline 4-2 ***
    The block diagram of the datapath should show ALL registers,
    including the implicit register of the Sequential Datapath Element.

Enclosed in Appendix B is the example Verilog file link_txdp.v which models
the datapath for the Link Layer Transmit Engine (see Reference [8]).
Let's take a look at some interesting observations from link_txdp.v:
 
    *** Verilog Coding Guideline 4-1 ***
    Keep the verilog coding of the datapath simple and straight forward.
    Leave the fancy coding (IF any) to the datapath elements and place
    such elements in a separate (library) file.

For example, the Verilog coding of link_txdp.v is simplified by using the
following two Sequential Datapath Elements:

    /*
     * Scrambler
     */
    l_scramble scrambler (
        .scr_out (scr_out),             .scr_in (32'hc2d2768d),
        .scr_init (txscr_init),         .scr_run (txscr_run),
        .clk (txclk4x),                 .reset (lktx_reset));

    /*
     * CRC Calculator
     */
    l_crccal crc_calculator (
        .crc_out (crc_out),
        .crc_in (32'h52325032),         .datain (tp_txdata),
        .crc_init (txcrc_init),         .crc_cal (txcrc_cal),
        .clk (txclk4x),                 .reset (lktx_reset));

As well a Combinational Datapath Element:

    /* 
     * Generate the primitive (prime_out) based on the selection (sel_prim)
     */
    l_primgen primgen (.prim_out (prim_out),    .sel_prim (sel_prim));

More specifically, the Verilog code in link_txdp.v only shows what the logic
designer cares about the most at the datapath level: how the datapath
elements (register, multiplexers, counters ... etc.) are connected together.
The detailed modeling of these datapath elements are done in link_library.v
which contains all library elements for the Link Layer.  For your reference,
link_library.v is also attached in Appendix B (see Reference [9]).  Below
are a few lines from link_library.v that defines the Scrambler.

    /********************************************************************
     * l_scramble: 32-bit scrambler that can be:
     *  a. Reset to all zeros asynchronously
     *  b. Load a fix pattern synchronously.
     *  c. Keep its old value if scramble is not enable.
     *  d. Update its output synchronously based on a LFSR algorithm.
     ********************************************************************/
    module l_scramble (scr_out, scr_in, scr_init, scr_run, clk, reset);

    output [31:0]       scr_out;        // Scrambler's output

    input [31:0]        scr_in;         // Initial pattern to be loaded
    input               scr_init;       // Load the initial pattern
    input               scr_run;        // Update scr_out based on a LFSR
    input               clk;
    input               reset;

    reg [31:0]          scram;          // Scramble data pattern
    reg                 a15, a14, a13,  // Intermediate scramble bits
                        a12, a11, a10,
                        a9, a8, a7, a6, a5, a4, a3, a2, a1, a0;

    wire [31:0]         runmuxout;      // Output of the scr_run MUX
    wire [31:0]         lastmux;        // Output of the final MUX

    /*
     * Combinational logic to produce the scramble pattern,
     * which should be updated whenever scr_out changes.
     * This logic was copied from Frank Lee's scramble.v
     */
    always @(scr_out) begin
        a15 = scr_out[31] ^ scr_out[29] ^ scr_out[20] ^ scr_out[16];
        a14 =               scr_out[30] ^ scr_out[21] ^ scr_out[17];
        a13 =               scr_out[31] ^ scr_out[22] ^ scr_out[18];
         :                     :
        scram[2]  = a15^a14^a13;
        scram[1]  = a15^a14;
        scram[0]  = a15;
    end // Scrambling logic

    /*                                   Priority:
     *          scram   scr_out          -------------------------------
     *              |   |                reset (asynchronous):   highest
     *          +---v---v---+            scr_init (synchronous): middle
     * scr_run-->\S 1   0  /             scr_run (synchronous):  lowest
     *            +---+---+  scr_in
     *                |       |
     *            +---v-------v---+
     *             \  0       1 S/<--scr_init (higher priority than scr_run)
     *              +-----+-----+
     *                    |
     *                    v
     *                  lastmux
     */
    v_mux2e #(32) run_mux (runmuxout, scr_run, scr_out, scram);
    v_mux2e #(32) init_mux (lastmux, scr_init, runmuxout, scr_in);
    v_regre #(32) scr_ff (scr_out, clk, lastmux, (scr_run | scr_init), reset);

    endmodule // l_scramble

The definition of the Scrambler l_scramble (the "l_" pre-fix indicates this
is defined in link_library.v) illustrates another Logic Design Guideline:

    *** Logic Design Guideline 4-3 ***
    While designing the Sequential Datapath Elements, separates the element
    into the two parts: (1) the combinational logic, and (2) the register.

For example in l_scramble.v, the combinational logic of the Scrambler is
modeled by the "always" statement:

    always @(scr_out) begin
        a15 = scr_out[31] ^ scr_out[29] ^ scr_out[20] ^ scr_out[16];
         :                     :
        scram[2]  = a15^a14^a13;
    end // Scrambling logic

while the register is modeled the 32-bit wide "v_regre" defined in library
shared by the entire design team (see Verilog Coding Guideline 4-2 below):

    v_regre #(32) scr_ff (scr_out, clk, lastmux, (scr_run | scr_init), reset);

The use of "v_regre" (the pre-fix "v_" indicates this element is defined
in the common library) illustrates the following Verilog Coding Guideline:

    *** Verilog Coding Guideline 4-2 ***
    The Verilog coding of the datapath elements should make use of the
    standard logic elements (registers, multiplexers, ... etc.) already
    defined in the library discussed in Verilog Coding Guideline 2-1.

The last file included in Appendix B is "link_defs.v" (see Reference [10])
which defines all the "symbolic values" (i.e. assign a symbolic name to a
given constant value) to be used by all the Verilog files for the Link layer.
For example, this following line:

	`include "link_defs.v"

is used in both the datapath file (link_txdp.v) and the Link Layer library
file (link_library.v) so that all the symbolic values defined in link_defs.v.
can be used by these two files.  Below are some examples of these symbolic
values that are specific to the datapath:

    /*
     * Number of primitives and the bit position of the 1-hot encoded vector
     */
    `define num_prim         18

    // Basic Primitives
    `define B_ALIGN           0
    `define B_SYNC            1
    `define B_CONT            2
       :                      :
    `define B_X_RDY           9
       :                      :
    `define B_PMACK          16
    `define B_PMNAK          17

These symbolic values are then used by datapath file (link_txdp.v) in the
following way:

    /*
     * Interconnections within this portion of the datapath
     */
    wire [`num_prim:0]                  // Number of primitives + D10.2
                        sel_prim;       // Select the proper primitives

    // Primitive send by the Transmit Controller
    assign sel_prim[`B_ALIGN] = txsn_align;
    assign sel_prim[`B_X_RDY] = txsn_xrdy;

It should be obvious that the Verilog code above is much easier to maintain
and much easier to understand than the equivalent Verilog code:

    /*
     * Interconnections within this portion of the datapath
     */
    wire [18:0]		sel_prim;

    // Primitive send by the Transmit Controller
    assign sel_prim[0] = txsn_align;
    assign sel_prim[9] = txsn_xrdy;

This example of how Verilog code uses symbolic values to improve its ease of
maintenance leads us to the following Verilog Coding Guideline: 

    *** Verilog Coding Guideline 4-3 ***
    Define symbolic values (see also Verilog Coding Guideline 5-2) in
    a header file (example: link_defs.v) and include this header file in
    all files that can make use of these symbolic values to make the
    Verilog code easier to maintain and easier to understand.

Other symbolic values defined in link_defs.v such as:

    // Number of TX states and bit position of the 1-hot state encoding
    `define num_lktxstate   15
    `define B_NOCOMM         0
    `define B_SENDALIGN      1
    `define B_NOCOMMERR      2
       :        :            :
    `define B_BUSYRCV       13
    `define B_POWERDOWN     14

    // State Values
    `define RESET           15'h0000        // All bits are zeros
    `define NOCOMM          15'h0001        // Bit 0 is set
    `define SENDALIGN       15'h0002        // Bit 1 is set
       :        :            :
    `define POWERSAVE       15'h4000        // Link layer is power down

are used for the Verilog code that models the controller for the Link Layer.
How these symbolic values can be used to simplify the Verilog code of the
controller will be explained in Section 5.  More specifically, please
refer to Verilog Coding Guideline 5-1 in Section 5.

5. Controller Design
-----------------------------------------------------------------------------
Almost without exception, within the core of every controller is one or more
finite state machine(s).  This is shown in Figure 5-1 where only one finite
state machine is shown for simplicity.  Reader with enough imagination should
be able to visualize how this picture can be generalized with multiple
finite state machines.

             +------------------------------------------------+
             |             A General Controller               |
             |           +--------------+---+---+             |  (2a)
    Inputs   | (1)       | Finite State |S  |   |             | Outputs
    -----------+--------->    Machine   |T R|   +-+-------------------->
             | | +---+   |      (4)     |A E|   | | +---+     | Type 1
             | | | R |   |              |T G|   | | | R |     |
             | +-> E +-+->See Figure 5-2|E  |   | +-> E +-+------------>
             | | | G | | | or Figure 5-3|   |   | | | G | |   | Outputs
             | | +-^-+ | +--------------+-^-+---+ | +-^-+ |   | Type 2
             | |   |   |                  |       |   |   |   |  (2b)
             | |  clk  |                 clk      |  clk  |   |
             | |       |                          |       |   |
             | |       |                        +-v-------v-+ |
             | |       |                        |           | |
             | |       +------------------------>  Simple   | | Outputs
             | |                                |  Random   +---------->
             | +--------------------------------> Logic (3) | | Type 3
             |                                  |           | |  (2c)
             |                                  +-----------+ |
             +------------------------------------------------+

              Figure 5-1: Block Diagram of the General Controller

Here are some important observations from Figure 5-1.  When you read the
numbered paragraphs below, please refer to the places in Figure 5-1 labeled
with the same numbers in parentheses:

 1. The inputs to the controller are divided into two groups.  The first
    group is used as inputs to the finite state machine directly while
    the second group is "staged" by one or more stage(s) of pipeline
    registers before being used as inputs by the finite state machine.

 2. As far as the outputs of the controller are concerned, they can be
    classified into three types:
     a. Outputs that come directly from the finite state machine's outputs.

     b. Outputs of the finite state machine after they have been staged
        by one or more stage(s) of pipeline register.

     c. Outputs of some random logic (see also Paragraph (3) below) whose
        inputs can either be any of the signals described in Paragraph (1),
        Paragraph (2a), or Paragraph (2b) above. 

 3. The "Simple Random Logic" here are commonly referred to as "glue logic"
    which consists of simple inverters, AND gates, and OR gates.  In theory,
    all these "glue logic" can be integrated into the finite state machine
    shown in either Figure 5-2 or Figure 5-3.  In practice, however, it
    is sometimes simplier to just use some "glue" logic.
  
 4. In general, there are two types of finite state machines:
     a. The simple Moore Machine shown in Figure 5-2 whose outputs depend
        ONLY on the current state.

     b. The more complex Meally Machine shown in Figure 5-3 whose outputs
        depend on BOTH the current state as well as the inputs.

           +---------------------------+
           |     +-------+       +---+ |         +--------+
           |  N  | Next  | Next  |S  | | Current |        |
           +--/-->       | State |t R| |  State  | Output |
                 | State +--/---->a e+-+---/----->        +--/--> Outputs
  Inputs -----/-->       |  N    |t g|     N     | Logic  |  P
              M  | Logic |       |e  |           |        |
                 +-------+       +-^-+           +--------+
                                   |
                                 Clock
 
                 Figure 5-2: The Moore State Machine

           +---------------------------+
           |     +-------+       +---+ |         +--------+
           |  N  | Next  | Next  |S  | | Current |        |
           +--/-->       | State |t R| |  State  | Output |
                 | State +--/---->a e+-+---/----->        +--/--> Outputs
  Inputs --+--/-->       |  N    |t g|     N     | Logic  |  P
           |  M  | Logic |       |e  |        +-->        |
           |     +-------+       +-^-+        |  |        |
           |                       |          |  +--------+
           |  Q  (Q <= M)        Clock        |
           +--/-------------------------------+

                 Figure 5-3: The Meally State Machine

One question raised by Figure 5-1's Item 1 and Item 2 (see Paragraph 1 and 2
above) is when and where should we use pipeline registers to stage the inputs
or outputs?  This leads us to the following logic design guideline:

    *** Logic Design Guideline 5-1 ***
    The best way to decide when and where to use pipeline register or
    registers to stage the controller inputs and outputs is to draw a
    timing diagram showing each register's effect on its outputs with
    respect to rising or falling edge of the register's input clock.

            |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  | 10  |
            |     |     |     |     |     |     |     |     |     |     |
            +--+  +--+  +--+  +--+  +--+  +--+  +--+  +--+  +--+  +--+  +-
     Clock  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
     -------+  +--+  +--+  +--+  +--+  +--+  +--+  +--+  +--+  +--+  +--+
            |     |     |     |     |     |     |     |     |     |     |
     --------------+ +---------+ +----------------------------------------
     Inputs /////// X  (1) A    X ////////////////////////////////////////
     --------------+ +---------+ +----------------------------------------
            |     |     |     |     |     |     |     |     |     |     |
     ---------------+ +---------+ +---------------------------------------
     Next State //// X    B (2)  X ///////////////////////////////////////
     ---------------+ +---------+ +---------------------------------------
            |     |     |     |     |     |     |     |     |     |     |
     --------------------+ +---------+ +----------------------------------
     Current State ////// X    B (3)  X //////////////////////////////////
     --------------------+ +---------+ +----------------------------------

             Figure 5-4: A Timing Diagram Showing Relative Timing

One simple example of such a timing diagram is shown in Figure 5-4, which
shows the effect of the State Register in Figure 5-2, the Moore State Machine.
When you read the numbered paragraphs below, please refer to the places in
Figure 5-4 labeled with the same numbers in parentheses:
 1. We assume the M-bit inputs changes from unknown to "A" right after the
    rising edge of Cycle 2.

 2. Assume the Next State Logic is such that as a result of Input being "A,"
    the Next State will become "B" regardless of its Current State.
    Then assuming the Next State Logic can generate the Next State output
    within the cycle time of Clock (this assumption needs to be verified
    with static timing analysis), then we no longer need to worry about
    the absolute delay of the Next State Logic.

 3. Because as long as Next State becomes "B" one set-up time before the
    rising edge of Cycle 3, the Current Sate will change to "B" one clock
    to q delay AFTER the rising edge of Cycle 3 due to the State Register.

In this simple example, only one register and three signals are shown.
Needless to say, in a real timing diagram, one will have multiple registers
and many more signals.  The basic idea, however, remains the same: shows only
the "relative timing," that is shows how the registers affect the timing of
the signals with respect the clock edge(s) but not the absolute delay timing.

A corollary of the Design Guideline 5-1 is:

    *** Logic Design Guideline 5-2 ***
    The block diagram of the controller should show ALL registers explicitly
    while the random logic can be represented by a simple black box.

By drawing all the registers EXPLICITLY in the block diagram, the designer
will less likely to make a mistake when he or she attempt to draw the
"relative timing" diagram similar to the one shown in Figure 5-4 (see
footnote below) when the designer thinks about the sequence of events need
to be controlled.  Notice that in Figure 5-1, we try to meet the Design
Guideline 5-2 by showing the State Register in the blackbox representing
the Finite State Machine.

    Footnote: Even if the designer does not draw such a timing diagram
    explicitly on paper, he or she may still has to "draw" it implicitly
    in his or her head.

Notice that both Figure 5-2 and Figure 5-3 show finite state machines with
a M-bit input, a N-bit state register, and a P-bit output.  The only
difference is that in Figure 5-2, the Moore machine, the P-bit output is a
function of the N-bit current state only while in Figure 5-3, the Meally
Machine, the P-bit output depends on both the N-bit current state as well
as a sub-set (Q is an integer smaller or equal to M) of the M-bit inputs.
Depending on the state encoding, the N-bit state registers can represents a
maximum of 2**N states or a minimum of N states if one-hot encoding is used.

    *** Logic Design Guideline 5-3 ***
    If possible, use one-hot encoding for the finite state machine's state
    encoding to simplify the Output Logic as well as the Next State Logic.

One hot encoding refer to the encoding style where each bit of the State
Register represents one state and the corresponding bit is asserted only
when the finite state machine is at the state represents by that bit.
Consequently, only ONE bit of the N-bit state register will be asserted at
any given time.  My experience is that one-hot encoding can greatly simplify
the logic equations for the Output Logic block (in most cases, reduce to
simple inverters, AND gates, and OR gates) as well as for the Next State
Logic block.  Philosophically, the reason why one-hot encoding can simplify
the output logic is simple: when the finite state machine designer designs
a finite state machine, he or she creates a state for one purpose: the state
indicates the need to set the outputs to some values different than any other
state (if not, there is no need to have a separate state!)  Therefore if
the state information is not one-hot encoded, the Output Logic must first
decode the N-bit state register before it can generates the output.  On the
other hand, when one-hot encoding is used, the need for doing a N-to-2**N
decode is eliminated.  Similarly when one-hot encoding is used, the Next
State Logic does not need to perform the equivalent of the N-to-2**N decode
before deciding what is the next state and once the next state is decided,
it does not need to perform the equivalent of a 2**N-to-N encoding of the
next state.

One draw-back of one-hot encoding is that for a finite state machine with a
large number of states (i.e. N is a big number is Figure 5-2 and Figure 5-3),
the State Register can be very wide.  A wide register, however, is usually
not that bad a problem.  In any case, in order to keep the design easy to
understand and debug, one may want to avoid using "one BIG and complex"
finite state machine anyway:

    *** Logic Design Guideline 5-4 ***
    Instead of designing a controller with a giant and complex finite state
    machine at its core, it may be easier to break the controller into
    multiple smaller controllers, each with a smaller and simplier finite
    state machine at its core.

In both Figure 5-2 and Figure 5-3, it is possible to integrate the Output
Logic block and the Next State Logic block into one single random logic
block.  However, in order to keep the logic design easy to understand:

    *** Logic Design Guideline 5-5 ***
    For finite state machine design, keep the Next State Logic block
    separate from the Output Logic block.

As I will show you later in Verilog Coding Guideline 5-3 and 5-4, the Verilog
code that models the finite state machine is also easier to read and understand
if the Next State Logic block is kept separate from the Output Logic block.

One final word on the Meally Machine shown in Figure 5-3.  The Output Logic's
input are shown to come from both the Current State and Input.  In order to
simplify the Output Logic block, it is "logically" equivalent to use some
of the Next State bits (i.e. output of the Next State Logic prior to the
State Register) as input to the Output Logic block.  This is shown in
Figure 5-5.  This, however, should be done with extreme care.

    *** Logic Design Guideline 5-6 ***
    In a Meally Machine design, it is possible to use the Next State Logic
    block's output as inputs to the Output Logic block.  This must be done
    with caution since the total delay of the two logic block may become
    the critical path of the controller.

           +---------------------------+
           |     +-------+       +---+ |.           +--------+
           |  N  | Next  | Next  |S  | |  Current   |        |
           +--/-->       | State |t R| |   State    | Output |
                 | State +--/-+-->a e+-+----/------->        +--/--> Outputs
  Inputs --+--/-->       |  N |  |t g|      N       |        |  P
           |  M  | Logic |    |  |e  |          +---> Logic  |
           |     +-------+    |  +-^-+          |   |        |
           |                  |    |   (R <= M) | +->        |
           |                  |  Clock    R     | | |        |
           |                  +------------/----+ | +--------+
           |  Q (Q <= M)                          |
           +--/-----------------------------------+

           Figure 5-5: An Alternate Form of the Meally State Machine


Enclosed in Appendix C are two Verilog files illustrating the various 
controller design guidelines:

    trans_defs.v    define all the symbolic values applys to all
                    Transport Layer files, see Reference [11].

    dtrans_txctl.v  models the controller for the Device Dongle's Transport
                    Layer Transmit Engine, see Reference [12].

First, let's take a look at some interesting observations from trans_defs.v.

    *** Verilog Coding Guideline 5-1 ***
    If one-hot encoding is used for the finite state machine (see Logic
    Design Guideline 5-3), define a symbolic value for each bit position
    as well as a symbolic value for the binary value when that bit position
    is set. This makes the Verilog code much easier to read and understand.

For example, here are some lines from the trans_defs.v file attached
in Appendix C (reader can read the entire definition in Appendix C):

    /*
     * Define the state values and bit position for Device's Transmit Finite
     * State machine (FSM in dtran_txctl).  This FSM implements the "transmit"
     * states describes in Section 8.7 (PP. 197-205) of SATA Spec, 1.0.
     */
    `define num_dttxfsm     15
    `define B_DTTXIDLE       0
    `define B_DTCHKTYP       1
    `define B_DTREGFIS       2      // Spec's DT_RegHDFIS
    `define B_DTPIOSTUP      3      // Spec's DT_PIOSTUPFIS
        :       :            :
    `define B_DTBISSTA      14

    // Device Dongle's TX FSM State Values
    `define DTTXIDLE        15'h0001
    `define DTCHKTYP        15'h0002
    `define DTREGFIS        15'h0004        // Spec's DT_RegHDFIS
    `define DTPIOSTUP       15'h0008        // Spec's DT_PIOSTUPFIS
        :       :            :
    `define DTBISSTA        15'h4000

Notice that I have use "define" to create these symbolic values:

    *** Verilog Coding Guideline 5-2 ***
    One common convention used by many Verilog code writer is to use
    "define" for constant values such as:

    	`define DTTXIDLE        15'h0001
        
    while "parameter" is used ONLY for things that can changed such as the
    width of the register, muxes ... etc. (see also Section 2).

	/***************************************************************
	 * Simple N-bit register with a 1 time-unit clock-to-q time
	 ***************************************************************/
	module v_reg( q, c, d );

	    parameter   n = 1;

	    input   [n-1:0] d;
	    input           c;
	    output  [n-1:0] q;

	    reg     [n-1:0] state;

	    assign  #(1) q = state;

	    always @(posedge c) begin
		state  = d;
	    end

	endmodule // v_reg

Next, lets look at the file dtrans_txctl.v.  The main module of this file
consists of the following sections clearly labeled by comments:

    module dtrans_txctl (
        // Outputs
        tp_acksendreg,
            :
        senddata,

        // Inputs
        at_sendreg,
            :
        tptx_reset);

        /*
         * Next State Logic and the State Register for the finite state machine
         */
        // Next State Logic
	dtrans_txfsm dtrans_txfsm ( ...

        // State Register
	v_reg #(`num_dttxfsm) state_ff (cur_state, txclk4x, next_state);

        /*
         * Counter and its MUX tree to select the count limit
         * for the generation of the expire signal
         */

        /*
         * Output Logic for generating output signals
         */

    endmodule // dtrans_txctl

This leads to the following Logic Design and Verilog Coding guidelines.

    *** Verilog Coding Guideline 5-3 ***
    Use an explicit State Register and separate the Next State Logic
    from this explicit register.

For example in dtrans_txctl.v, we have:

    /*
     * Next State Logic and the State Register for the finite state machine
     */
    // Next State Logic
    dtrans_txfsm dtrans_txfsm (
        // Outputs
        .next_state (next_state),

        // Inputs
        .cur_state (cur_state),
        .at_sendreg (at_sendreg),       .at_senddmaa (at_senddmaa),
             :
        .txtimeout (txtimeout),         .expire (expire),
        .tptx_reset (tptx_reset));

    // State Register
    v_reg #(`num_dttxfsm) state_ff (cur_state, txclk4x, next_state);

The Next State Logic here is implemented in the separate "dtrans_txfsm"
module in dtrans_txctl.v.  The module "dtrans_txfsm" has only one output,
the "next_state" vector, and contains only one thing: a "Case Statement"
enclosed in a "always" block:

    *** Verilog Coding Guideline 5-4 ***
    The Next State Logic, with only ONE output (the "next_state" vector),
    can be implemented easily with a Verilog Case statement.

    /*************************************************************************
     * Module dtrans_txfsm: Random logic for the transmit finite state machine
     ************************************************************************/
    module dtrans_txfsm (
        // Outputs
        next_state,

        // Inputs
        cur_state,
        at_sendreg,
        at_senddmaa,
          :
        expire,
        tptx_reset);
            :
            :
        always @(cur_state or at_sendreg or at_senddmaa ...

	    /*** List ALL Inputs of this module ***/

    		txtimeout or expire or tptx_reset) begin

            if (tptx_reset) begin
                next_state = `DTTXIDLE;
            end
            else begin
                case (cur_state)
                `DTTXIDLE:
		    if (~r2t_rxempty) begin
		        /*
 		         * Give the receive engine higher priority
		         */
		        next_state = `DTCHKTYP;
		    end
                     :
		end
                     :

		`DTBISSTA:
		    if (~lk_txfsmidle & ~txtimeout) begin
		        next_state = `DTBISSTA;
		    end
		     :

		default: begin	// We should never be here
		    next_state = `DTWAITTXID;
		    $display (
		    "*** Warning: Undefined HTP RX State, cur_state = %b ***",
		    cur_state);
		    end
	        endcase
	    end // End else (tptx_reset == 0)

        end // End always

    endmodule // dtrans_txfsm

Notice that the module "dtran_txfsm" has ONLY one output "next_state."  This
is a very desirable feature when we use the Verilog "Case Statement" because
one thing we have to be careful when we use the "Case Statement" is that
every output MUST have a defined value for each branch of the Case statement.
Otherwise, the synthesis tool will generate a latch to keep the old value,
which in most cases is NOT what the logic designer intends.  This, having
only one output (the "next_state") for the Next State Logic, is one reason
why the Logic Design Guideline 5-5 encourages you to separate the Next State
Logic block from the Output Logic block.

In many finite state machine design, the number of states can be reduced
and the Next State Logic can therefore be simplified if one take advantage
of the fact that the state machine wants to stay at a certain state for
"N cycles" (where N is a fix integer >=1) then go to the next state and
stay there for another "M cycles" (M is another integer >= 1 but != N)
before move onto another state.  One example of this behavior is the DRAM
controller where the controller will enter the "Row Address Active" state
for a few cycles, then go to the "Column Address Active" state for a few
cycles, before moving onto the "Precharge" state ... etc.

    *** Logic Design Guideline 5-7 ***
    A finite state machine containing states whose transition to their next
    states are governed only by the number of cycles it has to wait can be
    simplified by building a multiplexer tree to select the number of cycles
    a counter must count before generating an "expire" signal to trigger
    the state transition.

Logic Design Guideline 5-7 is illustrated by the following Verilog code in
dtrans_txctl.v.  In a nutshell:

 1. We start the counter (count_enable = 1) when the current state is either:
    DTREGFIS, DTPIOSTUP, DTXMITBIS, or DTDMASTUP.  Since we are using one-hot
    encoding, we are in one of this state when the corresponding bit in
    the cur_state register: cur_state[`B_DTREGFIS], cur_state[`B_DTPIOSTUP],
    cur_state[`B_DTXMITBIS], or cur_state[`B_DTDMASTUP] is set.

    assign count_enable = cur_state[`B_DTREGFIS] | cur_state[`B_DTPIOSTUP] |
        cur_state[`B_DTXMITBIS] | cur_state[`B_DTDMASTUP];

    v_countN #(`log_maxfis) expire_count (
        .count_out (wcount),
        .count_enable (count_enable),
        .clk (txclk4x),
        .reset (tptx_reset | expire));

 2. Based on the current state, the multiplexer tree is used to select the
    number of cycles the counter must count (count_limit) before the state is
    triggered to transition to the next state.

    /*
     * Counter and its MUX tree to select the count limit
     * for the generation of the expire signal
     */
    v_mux2e #(`log_maxfis) regpio_mux (num_regpio,
        cur_state[`B_DTPIOSTUP], `NDFISREGm1, `NDFISPIOSm1);
    v_mux2e #(`log_maxfis) dmabis_mux (num_dmabis,
        cur_state[`B_DTXMITBIS], `NBFISDMASm1, `NBFISBISTAm1);
    v_mux2e #(`log_maxfis) cntlmt_mux (count_limit,
        (cur_state[`B_DTXMITBIS] | cur_state[`B_DTDMASTUP]),
        num_regpio, num_dmabis);

    The number of cycles the counter needs to count for each state is
    defined in trans_defs.v:

    `define NDFISREGm1      3'd4    // Device-to-Host (D) Register (REG)
    `define NDFISPIOSm1     3'd4    // Device-to-Host (D) PIO Setup (PIOS)
    `define NBFISDMASm1     3'd6    // Bidirectional (B) DMA Setup (DMAS)
    `define NBFISBISTAm1    3'd2    // Bidirectional (B) BIST Activate (BISTA)

 3. Finally, the 3-bit comparator is used to generate the "expire" signal,
    which is used as input to the Next State Logic, to trigger the state
    transition when the counter reaches the "count_limit" selects by the
    MUX tree in Step 2.

    v_comparator #(`log_maxfis) expire_cmp (count_full, wcount, count_limit);
    assign expire = count_full & count_enable;

The last part of the Verilog code in dtrans_txctl.v:

    /*
     * Random logic for generating output signals 
     */ 
    assign tp_acksendreg  = cur_state[`B_DTREGFIS];
              :
    assign tp_acksenddata = cur_state[`B_DTDATAFIS]; 

    assign tp_sendndfis = cur_state[`B_DTREGFIS] | cur_state[`B_DTPIOSTUP] |
        cur_state[`B_DTDMASTUP] | cur_state[`B_DTDMAACT] |
        cur_state[`B_DTXMITBIS]; 

shows how the output logic and the "glue logic" (see Item 3 of Figure 5-1)
can be implemented with simple "assign" statements.

    *** Verilog Coding Guideline 5-5 ***
    With the more complex Next State Logic already taken care of by the
    "Case Statement" (see Verilog Coding Guideline 5-3) and with the
    help of one-hot encoding for the state machine, the Output Logic
    can usually be implemented easily with simple assign statements.


6. Miscellaneous Verilog Coding Guidelines
-----------------------------------------------------------------------------
If you look at the Verilog files in Appendix A, Appendix B, and Appendix C,
you will notice all the verilog files have very similar format.

    *** Verilog Coding Guideline 6-1 ***
    In order to keep the Verilog files easy to read and easy to understand
    for every member of the design team, adopt a standard format and use
    the same format for all Verilog files.

    For example, the link_txdp.v file in Appendix B follows this format:

    module module_name (
        // Bi-directional ports	(if any)
	bi_port1,		//*** First list the inout ports (if any)
	bi_port2,		//*** List one port per line

	// Output ports
	o_port3,		//*** Then list the output ports
	o_port4,

	// Input ports
	i_port5);		//*** Finally, list the input ports

	/*
 	 * Declare all bi-directional ports 
         */
	inout 		bi_port1;	//*** Declare one port per line
	inout 		bi_port2;

	/*
 	 * Declare all output ports
         */
	output		o_port3;
	output		o_port4;
	
	/*
 	 * Declare all input ports
         */
	input		i_port5;

	/*
         * After all ports are declared, declare all the wires
         */
	wire		wire1;		//** Declare one wire per line
	wire		wire2;

	/*
	 * Declare all registers (if any)
	 */
	reg		reg1;		//** Declare one register per line
	reg		reg2;

	/*
	 * Core of the Verilog code
	 */

    endmodule

Notice that in link_txdp.v file in Appendix B, when the module "l_scramble"
is instantiated, explicit connection (example: .reset (lktx_reset)) is used.

    l_scramble scrambler (
        .scr_out (scr_out),             .scr_in (32'hc2d2768d),
        .scr_init (txscr_init),         .scr_run (txscr_run),
        .clk (txclk4x),                 .reset (lktx_reset));

    *** Verilog Coding Guideline 6-2 ***
    In order to avoid confusion on which wire is connected which port,
    use explicit connection (example: .port_name (wire)) when a module
    is instantiated.

The module l_scramble module is defined in the file link_library.v which is
also included in Appendix B.  Notice the detailed comment in this module:

    /*                                   Priority:
     *          scram   scr_out          -------------------------------
     *              |   |                reset (asynchronous):   highest
     *          +---v---v---+            scr_init (synchronous): middle
     * scr_run-->\S 1   0  /             scr_run (synchronous):  lowest
     *            +---+---+  scr_in
     *                |       |
     *            +---v-------v---+
     *             \  0       1 S/<--scr_init (higher priority than scr_run)
     *              +-----+-----+
     *                    |
     *                    v
     *                  lastmux
     */

    *** Verilog Coding Guideline 6-3 ***
    In order to keep the Verilog code easy to understand for everyone
    (including yourself :-), use detailed comments.  More importantly,
    put in the comments as you do the coding because if you do not put
    in the comments now, it is unlikely you will put them in later.

Finally, one may notice the absent of the "timescale" statements in any
of the files that models the high level modules (Appendix A), the datapath
(Appendix B), and the controller (Appendix C).  The reason is that there
is no need to have any timescale statements in the Verilog code if the
Verilog Coding Guideline 2-2 is followed:

    *** Verilog Coding Guideline 2-2 ***
    Only the storage elements (examples: register and latch) have non-zero
    clock-to-q time.  All combinational logic (example: mux) has zero delay.

More specifically, as shown in Section 2, the v_reg and v_latch each has
"1 time unit" clock-to-q delay.  This clock-to-q delay is the ONLY delay
we have in our Verilog code.  Consequently, our Verilog code will work
no matter what time scale this time unit is set to (i.e. it can set to 1ps,
1ns, 1ms, ... etc.).  The only time we need to have a timescale statement
is when we want to run simulation on our Verilog model.

    *** Verilog Coding Guideline 6-4 ***
    Ideally, there should not be any "timescale" directive in any of the
    Verilog file that models the hardware (because they are not needed if
    we follow the Verilog Coding Guideline 2-2).  Consequently, there
    should only be ONE and only ONE timescale directive in any Verilog
    simulation run and that timescale directive should be placed at the
    beginning of the test bench file (see Reference [13]).

7. Summary of Logic Design and Verilog Coding Guidelines
-----------------------------------------------------------------------------
Below is a summary of all the logic design guidelines:

    *** Logic Design Guideline 2-1 (MOST IMPORTANT) ***
    The design MUST be as simple as possible and easy to understand!

    *** Logic Design Guideline 3-1 ***
    Use an hierarchal strategy that breaks the design into modules
    that consists of datapaths and controllers.  More specifically:

    1. Divide the problem into multiple modules with clean and well
       defined interface.

    2. For each module:
        a. Design the datapath that can process the data for that module.
        b. Design the controller to control the datapath and produce control
           outputs (if any) to other adjacent modules.

    *** Logic Design Guideline 3-2 ***
    Keep different clock domains separate and have an explicit
    synchronization module for signals that cross the clock domain.

    *** Logic Design Guideline 4-1 ***
    The best way to study the effect of the datapath's pipeline registers is
    to draw a timing diagram showing each register's effect on its outputs
    with respect to rising or falling edge of the register's input clock.

    *** Logic Design Guideline 4-2 ***
    The block diagram of the datapath should show ALL registers,
    including the implicit register of the Sequential Datapath Element.

    *** Logic Design Guideline 4-3 ***
    While designing the Sequential Datapath Elements, separates the element
    into the two parts: (1) the combinational logic, and (2) the register.

    *** Logic Design Guideline 5-1 ***
    The best way to decide when and where to use pipeline register or
    registers to stage the controller inputs and outputs is to draw a
    timing diagram showing each register's effect on its outputs with
    respect to rising or falling edge of the register's input clock.

    *** Logic Design Guideline 5-2 ***
    The block diagram of the controller should show ALL registers explicitly
    while the random logic can be represented by a simple black box.

    *** Logic Design Guideline 5-3 ***
    If possible, use one-hot encoding for the finite state machine's state
    encoding to simplify the Output Logic as well as the Next State Logic.

    *** Logic Design Guideline 5-4 ***
    Instead of designing a controller with a giant and complex finite state
    machine at its core, it may be easier to break the controller into
    multiple smaller controllers, each with a smaller and simplier finite
    state machine at its core.

    *** Logic Design Guideline 5-5 ***
    For finite state machine design, keep the Next State Logic block
    separate from the Output Logic block.

    *** Logic Design Guideline 5-6 ***
    In a Meally Machine design, it is possible to use the Next State Logic
    block's output as inputs to the Output Logic block.  This must be done
    with caution since the total delay of the two logic block may become
    the critical path of the controller.

    *** Logic Design Guideline 5-7 ***
    A finite state machine containing states whose transition to their next
    states are governed only by the number of cycles it has to wait can be
    simplified by building a MUX tree to select the number of cycles a
    counter must count before generating an "expire" signal to trigger
    the state transition.

Below is a summary of all the Verilog coding guidelines:

    *** Verilog Coding Guideline 2-1 ***
    Model all the standard logic elements in a library file to be SHARED
    by ALL engineers in the design team.

    *** Verilog Coding Guideline 2-2 ***
    Only the storage elements (examples: register and latch) have non-zero
    clock-to-q time.  All combinational logic (example: mux) has zero delay.

    *** Verilog Coding Guideline 2-3 ***
    Use explicit register and latch (example: v_reg and v_latch as
    shown in Section 2) in your verilog coding.  Do not rely
    on logic synthesis tools to generate latches or registers for you.

    *** Verilog Coding Guideline 3-1 ***
    A separate Verilog file is assigned to the Verilog code for:
     1. Each datapath.  Example: dtrans_txdp.v
     2. Each controller.  Example: dtrans_txctl.v
     3. As well as the Verilog code for each high level module,
	that is a module at a hierarchy level higher than the datapath
	and the controller.  Examples: link_tx.v, link_rx.v, and link.v

    *** Verilog Coding Guideline 3-2 ***
    In order to keep the number of Verilog files under control, one should
    try not to assign a separate Verilog file to any low level module that
    is at a hierarchy level lower than the datapath and the controller.

    *** Verilog Coding Guideline 3-3 ***
    The Verilog code for the high level module, that is module at a
    hierarchy level higher than the datapath and the controller (examples:
    module dtrans_tx, module dtrans_rx, and module dtrans) should not
    contains any logic.  It should only shows how the lower level modules
    are connected.
 
    *** Verilog Coding Guideline 4-1 ***
    Keep the verilog coding of the datapath simple and straight forward.
    Leave the fancy coding (IF any) to the datapath elements and place
    such elements in a separate (library) file.

    *** Verilog Coding Guideline 4-2 ***
    The Verilog coding of the datapath elements should make use of the
    standard logic elements (registers, multiplexers, ... etc.) already
    defined in the library discussed in Verilog Coding Guideline 2-1.

    *** Verilog Coding Guideline 4-3 ***
    Define symbolic values (see also Verilog Coding Guideline 5-2) in
    a header file (example: link_defs.v) and include this header file in
    all files that can make use of these symbolic values to make the
    Verilog code easier to maintain and easier to understand.

    *** Verilog Coding Guideline 5-1 ***
    If one-hot encoding is used for the finite state machine (see Logic
    Design Guideline 5-3), define a symbolic value for each bit position
    as well as a symbolic value for the binary value when that bit position
    is set. This makes the Verilog code much easier to read and understand.

    *** Verilog Coding Guideline 5-2 ***
    One common convention used by many Verilog code writer is to use
    "define" for constant values such as:

    	`define DTTXIDLE        15'h0001
        
    while "parameter" is used ONLY for things that can changed such as the
    width of the register, muxes ... etc. (see also Section 2).

    *** Verilog Coding Guideline 5-3 ***
    Use an explicit State Register and separate the Next State Logic
    from this explicit register.

    *** Verilog Coding Guideline 5-4 ***
    The Next State Logic, with only ONE output (the "next_state" vector),
    can be implemented easily with a Verilog Case statement.

    *** Verilog Coding Guideline 5-5 ***
    With the more complex Next State Logic already taken care of by the
    "Case Statement" (see Verilog Coding Guideline 5-3) and with the
    help of one-hot encoding for the state machine, the Output Logic
    can usually be implemented easily with simple assign statements.

    *** Verilog Coding Guideline 6-1 ***
    In order to keep the Verilog files easy to read and easy to understand
    for every member of the design team, adopt a standard format and use
    the same format for all Verilog files.

    *** Verilog Coding Guideline 6-2 ***
    In order to avoid confusion on which wire is connected which port,
    use explicit connection (example: .port_name (wire)) when a module
    is instantiated.

    *** Verilog Coding Guideline 6-3 ***
    In order to keep the Verilog code easy to understand for everyone
    (including yourself :-), use detailed comments.  More importantly,
    put in the comments as you do the coding because if you do not put
    in the comments now, it is unlikely you will put them in later.

    *** Verilog Coding Guideline 6-4 ***
    Ideally, there should not be any "timescale" directive in any of the
    Verilog file that models the hardware (because they are not needed if
    we follow the Verilog Coding Guideline 2-2).  Consequently, there
    should only be ONE and only ONE timescale directive in any Verilog
    simulation run and that timescale directive should be placed at the
    beginning of the test bench file (see Reference [13]).

With all these logic design and Verilog coding guidelines, does this mean
there is no room for logic designer to be creative?  Not at all.  Artists
such as movie directors and music composers need to follow many guidelines
and yet nobody can say they are not doing creative work.  They just spend
their creativity at tasks that require creativity and follow the standard
guidelines (such as a movie should be approximately 2 hours long) when
creativity is not needed.  Logic design is the same: be creative on tasks
that truly deserves innovation (such as how to build a datapath that can
process data at half the power) but not on tasks such as how to write a
complex Verilog statement that can save a few lines of Verilog code but
nobody else can understand.

The ultimate goal for any logic designer is to keep his or her design and
the Verilog code that models the design AS EASY TO UNDERSTAND AS POSSIBLE.
Remember this, the easier other people can understand your design and your
Verilog code, more people can help you in your work and less likely will
your vacation be interrupted by late night phone calls from your coworker
covering for you :-)  So make your design easy to understand :-)

8. References
-----------------------------------------------------------------------------
[1]  Private communications, October 2001.

[2]  For those readers who can access my home directory, the Parallel ATA
     Interface to the disk is modeled by the module dataif in the Verilog file:
	/home/kong/P2001/Verilog/DeviceDongle/ATAIF/dataif.v

[3]  For those readers who can access my home directory, the Transport Layer
     is modeled by the module dtrans in the Verilog file:
	/home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans.v

[4]  For those readers who can access my home directory, the Link Layer
     is modeled by the module link in the Verilog file:
	/home/kong/P2001/Verilog/DeviceDongle/Link/link.v

[5]  For those readers who can access my home directory, the files are in: 
	/home/kong/P2001/Verilog/DeviceDongle/ATAIF/dataif.v
        /home/kong/P2001/Verilog/DeviceDongle/ATAIF/dataif_dp.v
        /home/kong/P2001/Verilog/DeviceDongle/ATAIF/dataif_ctl.v

[6]  For those readers who can access my home directory, the files are in:
	/home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans.v
	/home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans_tx.v
	/home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans_txdp.v
	/home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans_txctl.v

	/home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans_rx.v
	/home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans_rxdp.v
	/home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans_rxctl.v

[7]  For those readers who can access my home directory, the files are in:
	/home/kong/P2001/Verilog/DeviceDongle/Link/link.v
	/home/kong/P2001/Verilog/DeviceDongle/Link/link_tx.v
	/home/kong/P2001/Verilog/DeviceDongle/Link/link_txdp.v
	/home/kong/P2001/Verilog/DeviceDongle/Link/link_txctl.v

	/home/kong/P2001/Verilog/DeviceDongle/Link/link_rx.v
	/home/kong/P2001/Verilog/DeviceDongle/Link/link_rxdp.v
	/home/kong/P2001/Verilog/DeviceDongle/Link/link_rxctl.v

[8]  For readers have accessed to my home directory, link_txdp.v is in:
	/home/kong/P2001/Verilog/DeviceDongle/Link/link_txdp.v

[9]  For readers have accessed to my home directory, link_library.v is in:
	/home/kong/P2001/Verilog/CommonFiles/link_library.v

[10] For readers have accessed to my home directory, link_defs.v is in:
	/home/kong/P2001/Verilog/CommonFiles/link_defs.v

     Note: Both the link_library.v (Reference [9] above) and link_defs.v
           are placed in the "CommonFiles" directory because they are used
           by all Link Layer files.

[11] For readers have accessed to my home directory, trans_defs.v is in:
	/home/kong/P2001/Verilog/CommonFiles/trans_defs.v 

     Note: The file trans_defs.v is placed in the "CommonFiles" directory
     because it is used by all Transport Layer files.

[12] For readers have accessed to my home directory, dtrans_txctl.v is in:
	/home/kong/P2001/Verilog/DeviceDongle/Transport/dtrans_txctl.v

[13] For those readers who can access my home directory, please refer to:
	/home/kong/P2001/Verilog/SATASys/Tests/test_init.v

------------------------ That's all for now folks :-) ------------------------