These are notes taken by Mary Jennings from
the Oct 1, 1993 lecture (videotape) for CS60a.

Distinguished Lecturer Series
October 1, 1993

Speaker:  Professor Richard Karp, University of California at Berkeley
                             and International Computer Sciences Institute

Topic:  Combinatorial Search Problems

     A combinatorial search problem requires that the solver find an
arrangement of a finite set of objects so as to satisfy certain constraints.
How do we distinguish easy from hard and intrinsically hard from temporarily
hard problems?  A few examples:

Telephone network (Steiner tree) problem:  We are given a (finite) set S of
     points.  This set S is the union of two proper subsets A and B;
     i.e., S = A U B, where A and B are both non-empty.  All of the points
     in set A must receive telephone service, so we must connect them by
     establishing telephone lines between them.  However, the points in set B
     are optional in the sense that we may connect some or all of them to
     the system of lines serving set A, but we do not have to connect any
     of them to that system.  This is a challenging problem when the set
     S is large.  However, removing the optional points (the subset B above)
     from S makes the problem easier.

Assignment problem:  We are given n workers to do n jobs.  The problem is to
     match the workers one-to-one with the jobs while minimizing cost.

     So, a combinatorial search problem requires that the solver find,
from among a very large, structured set of possible solutions, one solution
that satisfies the problem's constraints.  Variants of this type of problem
include decision and optimization problems.  More examples:

Scheduling:  School scheduling, NFL football games, airline flights

Routing networks:  Vehicles, phone calls, (computer) bits, oil, gas

VLSI circuit design:  placement and interconnection of components

Computational biology:  DNA molecules, etc.

Cryptography

     Many combinatorial search problems can be expressed by graphs.  Examples
include:

Eulerian Walk Problem:  Given a graph, determine whether there is a closed
     walk that covers each edge exactly once.  It turns out that such a
     walk exists if and only if the graph is connected and each vertex has
     an even number of neighbors.

Hamiltonian Circuit Problem:  Given a graph, determine whether there is a
     closed walk that visits each vertex exactly once.  This problem is hard.
     (All known solution algorithms require exponential time.)

     The Hamiltonian Circuit Problem gives rise to a question:  Are
combinatorial explosions inevitable?  In other words, are such problems
really that hard or do they seem so only because we have not yet found a
more efficient algorithm for solving them?  This question leads to a
discussion of definitions that clarify the boundary between tractable and
intractable problems.

     In 1965, J. Edmonds formulated the following definition:  A problem is
tractable if it can be solved in a number of steps bounded by a polynomial
in the size of the input.  His attempt to formalize this concept was a step
in the right direction, but it was necessary to establish some conventions
before we could apply this definition in a practical and far-reaching way.

These conventions apply only to decision problems:

     Input is encoded as strings of 0's and 1's.
     Each input is either accepted or rejected.
     We denote by P the set of all decision problems solvable in polynomial
     time via an algorithm on a Turing machine.

To define P more formally, we say that a problem L lies in P if there is an
algorithm A such that
     A accepts all strings in L,
     A rejects all strings not in L,
     there is a polynomial f(x) such that, for every string x, A terminates
          in (at most) |f(|x|)| steps, where |x| is the length of x.

Polynomial time reducibility:  A decision problem L is reducible to a
decision problem M if there is a polynomial time computable function F,
mapping strings to strings, such that a string x is accepted in L if and
only if the string F(x) is accepted in M.

     Note that the definition above yields a classification of problems with
respect to computability, because if L is reducible to M and M lies in P,
then L lies in P.  In other words, M is at least as hard as L.

Satisfiability Problem: [This type of problem comes to us from the realm of
     logic. A proposition is a statement which is either true or false.
     A propositional variable is a symbol (e.g., A) which represents
     such a statement.  In the system of logic under consideration,]
     proposition A is true if and only if ~A (read not A) is false.  Given
     propositional variables A, B, C and their corresponding literals A, ~A,
     B, ~B, C, ~C, etc., we can form disjunctive and conjunctive clauses which
     have values true or false.  For example,

"OR":  A U ~B U ~D U F  means
       A (is true) OR ~B (is true) OR ~D (is true) or F (is true)
     Such a clause is true whenever any one of the statements A, ~B, ~D, F is
     true.

"AND":  A AND B AND C  (actually written with intersection symbols, but I
                         have none on my keyboard)
     Such a clause is true if and only if all of the statements A, B, and C
     are true.

In the abstract, a formula such as those in the examples above is true if there
exists an assignment of truth values to variables that makes the formula
true.  Some applications:

Three-Coloring Problem:  Given a graph, decide whether vertices can be
     colored with three colors such that no two adjacent vertices are the
     same color.  The Three-Coloring Problem is reducible to the
     Satisfiability Problem.  The variables are R(i) for vertex red,
     B(i) for vertex blue, and G(i) for vertex green. The clauses are
     {for each vertex i,
          R(i) U G(i) U B(i)
          ~R(i) U ~B(i), ~R(i) U ~G(i), ~B(i) U ~G(i)} and
     {for each edge (i,j)
          ~R(i) U ~R(j), ~B(i) U ~B(j), ~G(i) U ~G(j)}.

The fact that Three-Coloring can be reduced to Satisfiability, you will
recall, implies that Satisfiability is at least as hard as Three-Coloring.

     Now we come to another class of problems.  We denote by NP the class
of decision problems that are checkable in polynomial time.  In other words,
given a solution to a problem in NP, we can check it in polynomial time.  More
formally,

A decision problem L lies in NP if and only if there exist a polynomial f(x)
and a decision algorithm A such that x lies in L if and only if there is a
witness y of length bounded by a polynomial in the length of x such that
A accepts (x,y).

For example,

Composite numbers:  An integer n is composite if it can be factored into the
     product of two integers each distinct from n.  If an integer is
     composite, a witness is one of its factors, [because if someone claims
     to have found a factorization of n, he (she) should be able to name
     the factors.  Suppose he (she) says that m is a factor of n.  Then we
     can verify or disprove his/her claim by applying the division algorithm.
     If we divide n by m and get a zero remainder, then the claim is true.
     Otherwise, it is false.]

We know that the set P is contained in the set NP.  Whether P = NP, however,
is an open question.  What is the relationship between P and NP?  If we ever
succeed in proving that P = NP, we shall also have proven that

     (1)  finding a solution to a decision problem is no harder than checking
          the solution,
     (2)  finding a proof is no harder than checking a proof,
     (3)  many seemingly intractable problems are solvable in polynomial
          time.

     Finally, we come to the class of problems which we call NP-Complete.
A decision problem L is NP-Complete if

     (1)  L is in NP,
     (2)  L is so general that every problem in NP can be transformed into L.

In other words, if L is NP-Complete, then L lies in P if and only if P = NP.

     In 1971, Cook's Theorem established that Satisfiability was NP-Complete.
Therefore, if Satisfiability turns out to be in P, then P will have been
proven equal to NP.

     Karp's List (1972) tells us that the Hamiltonian Circuit Problem,
the Three-Coloring Problem, and the Steiner Tree Problem are all NP-Complete.
(In 1973 a mathematician by the name of Levin (sp?) in the U.S.S.R. proved
similar results.)

     The moral of this story, then, is that if a problem is NP-Complete,
we cannot expect to solve it in polynomial time.  Instead, we must look
for an algorithm which can handle most instances of the problem, or, in the
case of an optimization problem, for an algorithm that gives a near optimal
solution.


{I took these notes on Professor Karp's lecture.  I hope I have done it justice.
If you find errors, please attribute them to my limitations as note-taker and
not to Professor Karp.  Occasionally, I took the liberty of inserting a brief
comment which I thought might clarify some of the material.  I tried, in each
such case, to enclose my remark in square brackets.  Sometimes it was not
possible to do so, however, without making the reading extremely cumbersome.

                                     ---- Mary Jennings}