Constrained optimization is the minimization of an
objective function subject to constraints on the possible values
of the independent variable. Constraints can be either
equality constraints or inequality
constraints. Because the scalar-variable case follows
easily from the vector one, only the latter is discussed in
detail here.
The typical constrained optimization problem has the form
minx{fx}
subject to
gx=0
x
f
x
subject to
g
x
0
where
f·
f
·
is the scalar-valued objective function and
g·
g
·
is the vector-valued constraint
function. Strict convexity of the objective function
is not sufficient to guarantee a unique
minimum; in addition, each component of the constraint must be
strictly convex to guarantee that the problem has a unique
solution. Because of the constraint, stationary points of
f·
f
·
alone may not be solutions to the constrained
problem: they may not satisfy the constraints. In fact,
solutions to the constrained problem are often
not stationary points of the objective
function. Consequently, the ad hoc
technique of searching for all stationary points of the
objective function that also satisfy the constraint do not
work.
The classical approach to solving constrained optimization
problems is the method of Lagrange multipliers.
This approach converts the constrained optimization problem
into an unconstrained one, thereby allowing use of the
techniques described in the previous section. The
Lagrangian of a constrained optimization problem
is defined to be the scalar-valued function
Lxλ=fx+λTgx
L
x
λ
f
x
λ
g
x
Essentially, the following theorem states that stationary
points of the Lagrangian are potential
solutions of the constrained optimization problem: as always,
each candidate solution must be tested to determine which
minimizes the objective function.
Let x⋄ x⋄
denote a local solution to
the constrained optimization problem given above where
the gradients
∇x
g
1
x
x
g
1
x
, …,
∇x
g
M
x
x
g
M
x
of the constraint function's components are
linearly independent. There then exists a unique vector
λ⋄ λ⋄
such that
∇xLx⋄λ⋄=0
x
L
x
⋄
λ
⋄
0
Furthermore, the quadratic form
yT∇x2Lx⋄λ⋄y
y
x
2
L
x
⋄
λ
⋄
y
is non-negative for all y y satisfying
∇xgxTy=0
x
g
x
y
0
.
The latter result in the theorem says that the Hessian of the
Lagrangian evaluated at its stationary points is non-negative
definite with respect to all vectors
orthogonal to the gradient of the
constraint. This result generalizes the notion of a positive
definite Hessian in unconstrained problems.
The rather abstract result of the preceding theorem has a
simple geometric interpretation. As shown in Figure 1, the constraint corresponds to a contour in
the x x plane.
A
contour map of the objective function indicates
those values of
x
x for which
fx=c
f
x
c
. In this figure, as
c
c becomes smaller, the contours shrink to a small
circle in the center of the figure. The solution to the
constrained optimization problem occurs when the smallest
value of
cc is chosen for which
the contour just touches the constraint contour. At that
point, the gradient of the objective function and of the
constraint contour are proportional to each other. This
proportionality vector is
λ⋄ λ⋄
, the so-called
Lagrange
multiplier. The Lagrange multiplier's exact value must
be such that the constraint is exactly satisfied. Note that
the constraint can be tangent to the objective function's
contour map for larger values of
c
c. These potential, but erroneous, solutions can be
discarded only by evaluating the objective function.
A typical problem arising in signal processing is to
minimize
xTAx
x
A
x
subject to the linear constraint
cTx=1
c
x
1
.
AA is a positive
definite, symmetric matrix (a correlation matrix) in most
problems. Clearly, the minimum of the objective function
occurs at
x=0
x
0
, but his solution cannot satisfy the constraint.
The constraint
gx=cTx−1
g
x
c
x
1
is a scalar-valued one; hence the theorem of
Lagrange applies as there are no multiple components in the
constraint forcing a check of linear independence. The
Lagrangian is
Lxλ=xTAx+λcTx−1
L
x
λ
x
A
x
λ
c
x
1
Its gradient is
2Ax+λc
2
A
x
λ
c
with a solution
x⋄=-
λ
⋄
A-1c2
x
⋄
λ
⋄
A
c
2
. To find the value of the Lagrange multiplier,
this solution must satisfy the constraint. Imposing the
constraint,
λ
⋄
cTA-1c=-2
λ
⋄
c
A
c
-2
; thus,
λ
⋄
=-2cTA-1c
λ
⋄
-2
c
A
c
and the total solution is
x⋄=A-1ccTA-1c
x
⋄
A
c
c
A
c
When the independent variable is complex-valued, the Lagrange
multiplier technique can be used if care
is taken to make the Lagrangian real. If it is not real, we
cannot use the theorem
that permits computation of stationary points by computing the
gradient with respect to
z¯
z
alone. The Lagrangian may not be real-valued even
when the constraint is real. Once insured real, the gradient
of the Lagrangian with respect to the conjugate of the
independent vector can be evaluated and the minimization
procedure remains as before.
Consider slight variations to the previous example: let the
vector z z be
complex so that the objective function is
zHAz
z
A
z
where A
A is a positive definite, Hermitian matrix and let
the constraint be linear, but vector-valued (
Cz=c
C
z
c
). The Lagrangian is formed from the objective
function and the real part of the usual constraint term.
Lzλ=zHAz+λHCz−c+λTC¯z¯−c¯
L
z
λ
z
A
z
λ
C
z
c
λ
C
z
c
For the Lagrange multiplier theorem to hold, the gradients
of each component of the constraint must be linearly
independent. As these gradients are the columns of
C C, their mutual
linear independence means that each constraint vector must
not be expressible as a linear combination of the others.
We shall assume this portion of the problem statement true.
Evaluating the gradient with respect to
z¯
z
, keeping z
z a constant, and setting the result equal to zero
yields
Az⋄+CHλ⋄=0
A
z
⋄
C
λ
⋄
0
The solution is z⋄ z⋄
is
-A-1CHλ⋄
A
C
λ
⋄
. Applying the constraint, we find that
CA-1CHλ⋄=-c
C
A
C
λ
⋄
c
. Solving for the Lagrange multiplier and
substituting the result into the solution, we find that the
solution to the constrained optimization problem is
z⋄=A-1CHCA-1CH-1c
z
⋄
A
C
C
A
C
c
The indicated matrix inverses always exist: A A is assumed invertible and
CA-1CH
C
A
C
is invertible because of the linear independence
of the constraints.
When some of the constraints are inequalities, the Lagrange
multiplier technique can be used, but the solution must be
checked carefully in its details. But first, the optimization
problem with equality and inequality constraints is formulated
as
minx{fx}
subject to
gx=0
and
hx≤0
x
f
x
subject to
g
x
0
and
h
x
0
As before,
f·
f
·
is the scalar-valued objective function and
g·
g
·
is the equality constraint function;
h·
h
·
is the inequality constraint function.
The key result which can be used to find the analytic solution
to this problem is to first form the Lagrangian in the usual
way as
Lxλμ=fx+λTgx+μThx
L
x
λ
μ
f
x
λ
g
x
μ
h
x
. The following theorem is the general statement of
the Lagrange multiplier technique for constrained optimization
problems.
Let x⋄ x⋄
be a local minimum for the
constrained optimization problem. If the gradients of
gg's
components and the gradients of those components of
h·
h
·
for which
h
i
x⋄=0
h
i
x
⋄
0
are linearly independent, then
∇xLx⋄λ⋄μ⋄=0
x
L
x
⋄
λ
⋄
μ
⋄
0
where
μ⋄≥0
μ
⋄
0
and
μi⋄
h
i
x⋄=0
μ
i
⋄
h
i
x
⋄
0
The portion of this result dealing with the inequality
constraint differs substantially from that concerned with the
equality constraint. Either a component of the constraint
equals its maximum value (zero in this case) and the
corresponding component of its Lagrange multiplier is
non-negative (and is usually positive)
or
a component is less than the constraint and its component of
the Lagrange multiplier is zero. This latter result means
that some components of the inequality constraint are not as
stringent as others and these lax ones do not affect the
solution.
The rationale behind this theorem is a technique for
converting the inequality constraint into an equality
constraint:
h
i
x≤0
h
i
x
0
is equivalent to
h
i
x+
s
i
2=0
h
i
x
s
i
2
0
. Since the new term, called a slack
variable, is non-negative, the constraint must be
non-positive. With the inclusion of slack variables, the
equality constraint theorem can be used and the above theorem
results. To prove the theorem, not only does the gradient
with respect to x x
need to be considered, but also with respect to the vector
s s of slack
variables. The
ith
ith
component of the gradient of the
Lagrangian with respect to s s at the stationary point is
2
μ
i
⋄
s
i
⋄
=0
2
μ
i
⋄
s
i
⋄
0
. If in solving the optimization problem,
s
i
⋄
=0
s
i
⋄
0
, the inequality constraint was in reality an
equality constraint and that component of the constraint
behaves accordingly. As
s
i
=-
h
i
x
s
i
h
i
x
,
s
i
=0
s
i
0
implies that that component of the inequality
constraint must equal zero. On the other hand, if
s
i
≠0
s
i
0
, the corresponding Lagrange multiplier must be zero.
Consider the problem of minimizing a quadratic form subject
to a linear equality constraint and an inequality constraint
on the norm of the linear constraint vector's variation.
minx{xTAx}
subject to
c+δTx=1
and
∥δ∥2≤ε
x
x
A
x
subject to
c
δ
x
1
and
δ
2
ε
This kind of problem arises in robust estimation. One seeks
a solution where one of the "knowns" of the problem,
c c in this case,
is, in reality, only approximately specified. The
independent variables are x x and
δ
δ. The Lagrangian for this problem is
Lxδλμ=xTAx+λc+δTx−1+μ∥δ∥2−ε
L
x
δ
λ
μ
x
A
x
λ
c
δ
x
1
μ
δ
2
ε
Evaluating the gradients with respect to the independent
variables yields
2Ax⋄+
λ
⋄
c+δ⋄=0
2
A
x
⋄
λ
⋄
c
δ
⋄
0
λ
⋄
x⋄+2
μ
⋄
δ⋄=0
λ
⋄
x
⋄
2
μ
⋄
δ
⋄
0
The latter equation is key. Recall that either
μ
⋄
=0
μ
⋄
0
or the inequality constraint is satisfied with
equality. If μ
⋄ μ
⋄ is zero, that
implies that x⋄
x ⋄ must be zero which will not allow the equality
constraint to be satisfied. The inescapable conclusion is
that
∥δ⋄∥2=ε
δ
⋄
2
ε
and that δ⋄
δ ⋄ is parallel to x⋄
x ⋄ :
δ⋄=-
λ
⋄
2
μ
⋄
x⋄
δ
⋄
λ
⋄
2
μ
⋄
x
⋄
. Using the first equation, x⋄ x ⋄
is found to be
x⋄=-
λ
⋄
2A−
λ
⋄
24μ⋄I-1c
x
⋄
λ
⋄
2
A
λ
⋄
2
4
μ
⋄
I
c
Imposing the constraints on this solution results in a pair
of equations for the Lagrange multipliers.
1/4
λ
⋄
2
μ
⋄
2cTA−1/4
λ
⋄
2
μ
⋄
I-2c=ε
14
λ
⋄
2
μ
⋄
2
c
A
14
λ
⋄
2
μ
⋄
I
-2
c
ε
cTA−1/4
λ
⋄
2
μ
⋄
I-1c=-2
λ
⋄
−2ε
μ
⋄
λ
⋄
2
c
A
14
λ
⋄
2
μ
⋄
I
c
2
λ
⋄
2
ε
μ
⋄
λ
⋄
2
Multiple solutions are possible and each must be checked.
The rather complicated completion of this example is left to
the (numerically oriented) reader.