Hessian of a Function

Definition

The Hessian of a twice-differentiable function f : mathbf{R}^n rightarrow mathbf{R} at a point x in mbox{bf dom} f is the matrix containing the second derivatives of the function at that point. That is, the Hessian is the matrix with elements given by

 H_{ij} = frac{partial^2 f}{partial x_i partial x_j}(x) , ;; 1 le i, j le n.

The Hessian of f at x is often denoted nabla^2 f(x).

The second-derivative is independent of the order in which derivatives are taken. Hence, H_{ij} = H_{ji} for every pair (i,j). Thus, the Hessian is a symmetric matrix.

Examples

Hessian of a quadratic function

Consider the quadratic function

 q(x) = x_1^2 + 2 x_1x_2 + 3 x_2^2 + 4 x_1 + 5 x_2 + 6.

The Hessian of q at x is given by

 frac{partial^2 q}{partial x_i partial x_j}(x) = left(begin{array}{cc} frac{partial^2 q}{partial x_1^2}(x)  & frac{partial^2 q}{partial x_1 partial x_2}(x)  frac{partial^2 q}{partial x_2 partial x_1}(x) & frac{partial^2 q}{partial x_2^2}(x)end{array} right) = left(begin{array}{cc} 2 & 2  2 & 6 end{array} right).

For quadratic functions, the Hessian is is a constant matrix, that is, it does not depend on the point at which it is evaluated.

Hessian of the log-sum-exp function

Consider the ‘‘log-sum-exp’’ function mbox{lse} : mathbf{R}^2 rightarrow mathbf{R}, with values

 mbox{lse}(x) := log( e^{x_1}+e^{x_2} ).

The gradient of mbox{lse} at x is

 nabla mbox{lse}(x) = frac{1}{z_1+z_2} left(begin{array}{c}  z_1  z_2 end{array} right) .

where z_i := e^{x_i}, i=1,2. The Hessian is given by

 nabla^2 mbox{lse}(x) = frac{z_1z_2}{(z_1+z_2)^2} left(begin{array}{cc}  1 & -1  -1 & 1 end{array} right) .

More generally, the Hessian of the function f : mathbf{R}^n rightarrow mathbf{R} with values

 mbox{lse}(x) = log left( sum_{i=1}^n e^{x_i} right)

is as follows.

  • First the gradient at a point x is (see here):

 nabla mbox{lse}(x) = frac{1}{sum_{i=1}^n e^{x_i}} left( begin{array}{c} e^{x_1}  ldots  e^{x_n} end{array} right) = frac{1}{Z}z ,

where z = (e^{x_1}, ldots, e^{x_n}), and Z = sum_{i=1}^n z_i.

  • Now the Hessian at a point x is obtained by taking derivatives of each component of the gradient. If g_i(x) is the i-th component, that is,

 g_i(x) = frac{e^{x_i}}{sum_{i=1}^n e^{x_i}} = frac{z_i}{Z},

then

 frac{partial g_i(x)}{partial x_i} = frac{z_i}{Z}-frac{z_i^2}{Z^2} ,

and, for j ne i:

 frac{partial g_i(x)}{partial x_j} = -frac{z_iz_j}{Z^2} .

More compactly:

 nabla^2 mbox{lse}(x) = frac{1}{Z^2} left( Z mbox{bf diag}(z)-zz^T right).