The simplest optimization problem is to find the minimum of a
scalar-valued function of a scalar variable
fx
f
x
---the so-called objective
function---and where that minimum is located. Assuming
the function is differentiable, the well-known conditions for
finding the minima---local and global---are
dd
x
fx=0
x
f
x
0
d
2
dx
2
fx>0
x
2
f
x
0
All values of the independent variable x
x satisfying these relations are locations of local
minima.
Without the second condition, solutions to the first could be
either maxima, minima, or inflection points. Solutions to the
first equation are termed the stationary points
of the objective function. To find the
global minimum---that value
(or values) where the function achieves its smallest
value---each candidate extremum must be tested: the objective
function must be evaluated at each stationary point and the
smallest selected. If, however, the objective function can be
shown to be strictly convex, then only one solution of
dd
x
f=0
x
f
0
exists and that solution corresponds to the global
minimum. The function
fx
f
x
is strictly convex if, for any choice
of x1
x1
,
x2
x2
, and the scalar
a a,
fa
x
1
−1
x
2
<af
x
1
−1f
x
2
f
a
x
1
1
a
x
2
a
f
x
1
1
a
f
x
2
. Convex objective functions occur often in practice
and are more easily minimized because of this property.
When the objective function
f·
f
·
depends on a complex variable z
z, subtleties enter the
picture. If the function
fz
f
z
is differentiable, its extremes can be found in the
obvious way: find the derivative, set it equal to zero, and
solve for the locations of the extrema. However, there are many
situations in which this
function is not differentiable. In
contrast to functions of a real variable, non-differentiable
functions of a complex variable occur frequently. The
simplest example is
fz=|z|2
f
z
z
2
. The minimum value of this function obviously
occurs at the origin. To calculate this obvious answer, a
complication arises: the function
fz=z¯
f
z
z
is not analytic with respect to
z z and hence not differentiable.
More generally, the derivative of a function with respect to a
complex-valued variable cannot be evaluated directly when the
function depends on the variable's conjugate. See Churchill [1] for more about the analysis of functions of a complex variable.
This complication can be resolved with either of two methods
tailored for optimization problems. The first is to express
the objective function in terms of the real and imaginary
parts of z z and find the
function's minimum with respect to these two variables. This approach is
unnecessarily tedious but will yield the solution. The
second, more elegant, approach relies on two results from
complex variable theory. First, the quantities
z z and
z¯
z
can be treated as independent variables, each
considered a constant with respect to the other. A variable
and its conjugate are thus viewed as the result of applying an
invertible linear transformation to the variable's real and
imaginary parts. Thus, if the real and imaginary parts can be
considered as independent variables, so can the variable and
its conjugate with the advantage that the mathematics is far
simpler. In this way,
∂|z|2∂
z
=z¯
z
z
2
z
and
∂|z|2∂
z¯
=z
z
z
2
z
. Seemingly, the next step to minimizing the
objective function is to set the derivatives with respect to
each quantity to zero and then solve the resulting pair of
equations. As the following theorem suggests, that solution
is overly complicated.
If the function
fzz¯
f
z
z
is real-valued and analytic with respect to
z z and
z¯
z
, all stationary points can be found by setting
the derivative (in the sense just given) with respect to
either z
z or
z¯
z
to zero [2].
Thus, to find the minimum of
|z|2
z
2
, compute the derivative with respect to either
z z or
z¯
z
. In most cases, the derivative with respect to
z¯
z
is the most convenient choice. Thus,
∂|z|2∂
z¯
=z
z
z
2
z
and the stationary point is
z=0
z
0
. As this objective function is strictly convex, the
objective function's sole stationary point is its global
minimum.
When the objective function depends on a vector-valued
quantity x x, the
evaluation of the function's stationary points is a simple
extension of the scalar-variable case. However, testing
stationary points as possible locations for minima is more
complicated [3]. The gradient of the scalar-valued
function
fx
f
x
of a vector x
x (dimension N N) equals
an N N-dimensional vector where
each component is the partial derivative of
f·
f
·
with respect to each component of x x.
∇xfx=∂fx∂
x
1
⋮∂fx∂
x
N
x
f
x
x
1
f
x
⋮
x
N
f
x
For example, the gradient of
xTAx
x
A
x
is
Ax+ATx
A
x
A
x
. This result is easily derived by expressing the
quadratic form as a double sum (
∑
i∧j
i∧jAi,jxixj
i
j
i
j
A
i
j
x
i
x
j
) and evaluating the partials directly. When
A A is symmetric,
which is often the case, this gradient becomes
2Ax
2
A
x
.
The gradient "points" in the direction of the maximum rate
of increase of the function
f·
f
·
. This fact is often used in numerical optimization
algorithms. The method of steepest descent is an
iterative algorithm where a candidate minimum is augmented by
a quantity proportional to the negative of the objective
function's gradient to yield the next candidate.
∀
α
,α>0:xk−1−α∇xfx
α
α
0
x
k
x
k
1
α
x
f
x
If the objective function is sufficiently "smooth" (there
aren't too many minima and maxima), this approach will yield
the global minimum. Strictly convex functions are certainly
smooth for this method to work.
The gradient of the gradient of
fx
f
x
, denoted by
∇x
2
fx
x
2
f
x
, is a matrix where
jth
jth
column is the gradient of the
jth
jth
component of f f's gradient. This quantity is
known as the Hessian, defined to be the matrix of
all the second partials of
f·
f
·
.
∇x
2
fxi,j=∂2fx∂
xi
∂
xj
x
2
f
x
i
j
x
i
x
j
f
x
The Hessian is always a symmetric matrix.
The minima of the objective function
fx
f
x
occur when
∇xfx=0
x
f
x
0
and
∇
x
2
fx>0
x
2
f
x
0
i.e., positive definite. Thus, for a
stationary point to be a minimum, the Hessian evaluated at
that point must be a positive definite matrix. When the
objective function is strictly convex, this test need not be
performed. For example, the objective function
fx=xTAx
f
x
x
A
x
is convex whenever A A is positive definite and
symmetric.
When the independent vector is complex-valued, the issues
discussed in the scalar case also arise. Because of the
complex-valued quantities involved, how to evaluate the
gradient becomes an issue: is
∇
z
∇
z
or
∇
z
*
∇
z
*
more appropriate?. In contrast to the case of complex
scalars, the choice in the case of complex vectors is unique.
Let
fzz¯
f
z
z
be a real-valued function of the vector-valued
complex variable z
z where the dependence on the variable and its
conjugate is explicit. By treating z z and
z¯
z
as independent variables, the quantity
pointing in the direction of the maximum rate of change
of
fzz¯
f
z
z
is
∇
z¯
fz
z
f
z
[2].
To show this result, consider the variation of
f f given by
δf=∑
i
i∂f∂
z
i
δ
z
i
+∂f∂
z
i
¯
δ
z
i
¯=∇zfTδz+∇
z¯
fTδz¯
δ
f
i
i
z
i
f
δ
z
i
z
i
f
δ
z
i
z
f
δ
z
z
f
δ
z
This quantity is concisely expressed as
δf=2ℜ∇
z¯
fHδz
δ
f
2
z
f
δ
z
. By the Schwarz inequality, the maximum value of
this variation occurs when
δz
δ
z
is in the same direction as
(
∇
z¯
f
z
f
). Thus, the direction corresponding to the largest
change in the quantity
fzz¯
f
z
z
is in the direction of its gradient with respect to
z¯
z
. To implement the method of steepest descent, for
example, the gradient with respect to the conjugate
must be used.
To find the stationary points of a scalar-valued function of a
complex-valued vector, we must solve
∇
z¯
fz=0
z
f
z
0
(1)
For solutions of this equation to be minima, the Hessian
defined to be the matrix of mixed partials given by
∇z∇
z¯
fz
z
z
f
z
must be positive definite. For example, the
required gradient of the objective function
zHAz
z
A
z
is given by
Az
A
z
, implying for positive definite
A A that a stationary point is
z=0
z
0
. The Hessian of the objective function is simply
A A, confirming
that the minimum of a quadratic form is always the origin.