Skip to content Skip to navigation

Connexions

You are here: Home » Content » Optimization Theory

Navigation

Recently Viewed

This feature requires Javascript to be enabled.
 

Optimization Theory

Module by: Don Johnson. E-mail the author

Optimization theory is the study of the extremal values of a function: its minima and maxima. Topics in this theory range from conditions for the existence of a unique extremal value to methods---both analytic and numeric---for finding the extremal values and for what values of the independent variables the function attains its extremes. In this book, minimizing an error criterion is an essential step toward deriving optimal signal processing algorithms. An appendix summarizing the key results of optimization theory is essential to understand optimal algorithms.

Unconstrained Optimization

The simplest optimization problem is to find the minimum of a scalar-valued function of a scalar variable fx f x ---the so-called objective function---and where that minimum is located. Assuming the function is differentiable, the well-known conditions for finding the minima---local and global---are1 dd x fx=0 x f x 0 d 2 dx 2 fx>0 x 2 f x 0 All values of the independent variable x x satisfying these relations are locations of local minima.

Without the second condition, solutions to the first could be either maxima, minima, or inflection points. Solutions to the first equation are termed the stationary points of the objective function. To find the global minimum---that value (or values) where the function achieves its smallest value---each candidate extremum must be tested: the objective function must be evaluated at each stationary point and the smallest selected. If, however, the objective function can be shown to be strictly convex, then only one solution of dd x f=0 x f 0 exists and that solution corresponds to the global minimum. The function fx f x is strictly convex if, for any choice of x1 x1 , x2 x2 , and the scalar a a, fa x 1 1 x 2 <af x 1 1f x 2 f a x 1 1 a x 2 a f x 1 1 a f x 2 . Convex objective functions occur often in practice and are more easily minimized because of this property.

When the objective function f· f · depends on a complex variable z z, subtleties enter the picture. If the function fz f z is differentiable, its extremes can be found in the obvious way: find the derivative, set it equal to zero, and solve for the locations of the extrema. However, there are many situations in which this function is not differentiable. In contrast to functions of a real variable, non-differentiable functions of a complex variable occur frequently. The simplest example is fz=|z|2 f z z 2 . The minimum value of this function obviously occurs at the origin. To calculate this obvious answer, a complication arises: the function fz=z¯ f z z is not analytic with respect to z z and hence not differentiable. More generally, the derivative of a function with respect to a complex-valued variable cannot be evaluated directly when the function depends on the variable's conjugate. See Churchill [1] for more about the analysis of functions of a complex variable.

This complication can be resolved with either of two methods tailored for optimization problems. The first is to express the objective function in terms of the real and imaginary parts of z z and find the function's minimum with respect to these two variables.2 This approach is unnecessarily tedious but will yield the solution. The second, more elegant, approach relies on two results from complex variable theory. First, the quantities z z and z¯ z can be treated as independent variables, each considered a constant with respect to the other. A variable and its conjugate are thus viewed as the result of applying an invertible linear transformation to the variable's real and imaginary parts. Thus, if the real and imaginary parts can be considered as independent variables, so can the variable and its conjugate with the advantage that the mathematics is far simpler. In this way, |z|2 z =z¯ z z 2 z and |z|2 z¯ =z z z 2 z . Seemingly, the next step to minimizing the objective function is to set the derivatives with respect to each quantity to zero and then solve the resulting pair of equations. As the following theorem suggests, that solution is overly complicated.

Theorem 1

If the function fzz¯ f z z is real-valued and analytic with respect to z z and z¯ z , all stationary points can be found by setting the derivative (in the sense just given) with respect to either z z or z¯ z to zero [2].

Thus, to find the minimum of |z|2 z 2 , compute the derivative with respect to either z z or z¯ z . In most cases, the derivative with respect to z¯ z is the most convenient choice.3 Thus, |z|2 z¯ =z z z 2 z and the stationary point is z=0 z 0 . As this objective function is strictly convex, the objective function's sole stationary point is its global minimum.

When the objective function depends on a vector-valued quantity x x, the evaluation of the function's stationary points is a simple extension of the scalar-variable case. However, testing stationary points as possible locations for minima is more complicated [3]. The gradient of the scalar-valued function fx f x of a vector x x (dimension N N) equals an N N-dimensional vector where each component is the partial derivative of f· f · with respect to each component of x x. xfx=fx x 1 fx x N x f x x 1 f x x N f x For example, the gradient of xTAx x A x is Ax+ATx A x A x . This result is easily derived by expressing the quadratic form as a double sum ( ij ijAi,jxixj i j i j A i j x i x j ) and evaluating the partials directly. When A A is symmetric, which is often the case, this gradient becomes 2Ax 2 A x .

The gradient "points" in the direction of the maximum rate of increase of the function f· f · . This fact is often used in numerical optimization algorithms. The method of steepest descent is an iterative algorithm where a candidate minimum is augmented by a quantity proportional to the negative of the objective function's gradient to yield the next candidate. α ,α>0:xk1αxfx α α 0 x k x k 1 α x f x If the objective function is sufficiently "smooth" (there aren't too many minima and maxima), this approach will yield the global minimum. Strictly convex functions are certainly smooth for this method to work.

The gradient of the gradient of fx f x , denoted by x 2 fx x 2 f x , is a matrix where jth jth column is the gradient of the jth jth component of f f's gradient. This quantity is known as the Hessian, defined to be the matrix of all the second partials of f· f · . x 2 fxi,j=2fx xi xj x 2 f x i j x i x j f x The Hessian is always a symmetric matrix.

The minima of the objective function fx f x occur when xfx=0 x f x 0 and x 2 fx>0 x 2 f x 0 i.e., positive definite. Thus, for a stationary point to be a minimum, the Hessian evaluated at that point must be a positive definite matrix. When the objective function is strictly convex, this test need not be performed. For example, the objective function fx=xTAx f x x A x is convex whenever A A is positive definite and symmetric.4

When the independent vector is complex-valued, the issues discussed in the scalar case also arise. Because of the complex-valued quantities involved, how to evaluate the gradient becomes an issue: is z z or z * z * more appropriate?. In contrast to the case of complex scalars, the choice in the case of complex vectors is unique.

Theorem 2

Let fzz¯ f z z be a real-valued function of the vector-valued complex variable z z where the dependence on the variable and its conjugate is explicit. By treating z z and z¯ z as independent variables, the quantity pointing in the direction of the maximum rate of change of fzz¯ f z z is z¯ fz z f z [2].

To show this result, consider the variation of f f given by δf= i if z i δ z i +f z i ¯ δ z i ¯=zfTδz+ z¯ fTδz¯ δ f i i z i f δ z i z i f δ z i z f δ z z f δ z This quantity is concisely expressed as δf=2 z¯ fHδz δ f 2 z f δ z . By the Schwarz inequality, the maximum value of this variation occurs when δz δ z is in the same direction as ( z¯ f z f ). Thus, the direction corresponding to the largest change in the quantity fzz¯ f z z is in the direction of its gradient with respect to z¯ z . To implement the method of steepest descent, for example, the gradient with respect to the conjugate must be used.

To find the stationary points of a scalar-valued function of a complex-valued vector, we must solve

z¯ fz=0 z f z 0 (1)
For solutions of this equation to be minima, the Hessian defined to be the matrix of mixed partials given by z z¯ fz z z f z must be positive definite. For example, the required gradient of the objective function zHAz z A z is given by Az A z , implying for positive definite A A that a stationary point is z=0 z 0 . The Hessian of the objective function is simply A A, confirming that the minimum of a quadratic form is always the origin.

Footnotes

  1. The maximum of a function is found by finding the minimum of its negative.
  2. The multi-variate minimization problem is discussed in a few paragraphs.
  3. Why should this be? In the next few examples, try both and see which you feel is "easier".
  4. Note that the Hessian of xTAx x A x is 2A 2 A .

References

  1. R.V. Churchill and J.W. Brown. (1989). Complex Variables and Applications. McGraw-Hill.
  2. D.H. Brandwood. (1983). A complex gradient operator and its application in adaptive array theory. IEE Proc., Pts. F and H, 130, 11-16.
  3. D.G. Luenberger. (1969). Optimization by Vector Space Methods. Wiley.

Content actions

Download module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks