Skip to content Skip to navigation Skip to collection information

OpenStax_CNX

You are here: Home » Content » Iterative Design of l_p Digital Filters » Appendix: Optimization Theory

Navigation

Recently Viewed

This feature requires Javascript to be enabled.
 

Appendix: Optimization Theory

Module by: Ricardo Vargas. E-mail the author

Optimization theory is the branch of applied mathematics whose purpose is to consider a mathematical expression in order to find a set of parameters that either maximize or minimize it. Being an applied discipline, problems usually arise from real-life situations including areas like science, engineering and finance (among many other). This section presents some basic concepts for completeness and is not meant to replace a treaty on the subject. The reader is encouraged to consult further references for more information.

Solution of linear weighted least squares problems

Consider the quadratic problem

min h d - C h 2 min h d - C h 2
(1)

which can be written as

min h d - C h T d - C h min h d - C h T d - C h
(2)

omitting the square root since this problem is a strictly convex one. Therefore its unique (and thus global) solution is found at the point where the partial derivatives with respect to the optimization variable are equal to zero. That is,

h d - C h T d - C h = h d T d - 2 d T C h + C h T C h = - 2 C T d + 2 C T C h = 0 C T C h = C T d h d - C h T d - C h = h d T d - 2 d T C h + C h T C h = - 2 C T d + 2 C T C h = 0 C T C h = C T d
(3)

The solution of Equation 3 is given by

h = C T C - 1 C T d h = C T C - 1 C T d
(4)

where the inverted term is referred [2], [1] as the Moore-Pentrose pseudoinverse of CTCCTC.

In the case of a weighted version of Equation 1,

min h w d - C h 2 2 = k w k | d k - C k h | 2 min h w d - C h 2 2 = k w k | d k - C k h | 2
(5)

where CkCk is the kk-th row of CC, one can write Equation 5 as

min h W ( d - C h ) T W ( d - C h ) min h W ( d - C h ) T W ( d - C h )
(6)

where W=diag(w)W=diag(w) contains the weighting vector ww. The solution is therefore given by

h = C T W T W C - 1 C T W T W d h = C T W T W C - 1 C T W T W d
(7)

Newton's method and the approximation of linear systems in an lplp sense

Newton's method and lplp linear phase systems

Consider the problem

min a g ( a ) = A ( ω ; a ) - D ( ω ) p min a g ( a ) = A ( ω ; a ) - D ( ω ) p
(8)

for aRM+1aRM+1. Problem Equation 8 is equivalent to the better posed problem

min a f ( a ) = g ( a ) p = A ( ω ; a ) - D ( ω ) p p = i = 0 L C i a - D i p min a f ( a ) = g ( a ) p = A ( ω ; a ) - D ( ω ) p p = i = 0 L C i a - D i p
(9)

where Di=D(ωi)Di=D(ωi), ωi[0,π]ωi[0,π], Ci=[Ci,0,...,Ci,M]Ci=[Ci,0,...,Ci,M], and

C = C 0 C L C = C 0 C L
(10)

The ijij-th element of CC is given by Ci,j=cos ωi(M-j)Ci,j=cos ωi(M-j), where 0iL0iL and 0jM0jM. From Equation 9 we have that

f ( a ) = a 0 f ( a ) a M f ( a ) f ( a ) = a 0 f ( a ) a M f ( a )
(11)

where ajaj is the jj-th element of aRM+1aRM+1 and

a j f ( a ) = a j i = 0 L C i a - D i p = i = 0 L a j C i a - D i p = p i = 0 L C i a - D i p - 1 · a j C i a - D i a j f ( a ) = a j i = 0 L C i a - D i p = i = 0 L a j C i a - D i p = p i = 0 L C i a - D i p - 1 · a j C i a - D i
(12)

Now,

a j C i a - D i = sign ( C i a - D i ) · a j ( C i a - D i ) = C i , j sign ( C i a - D i ) a j C i a - D i = sign ( C i a - D i ) · a j ( C i a - D i ) = C i , j sign ( C i a - D i )
(13)

where 1

sign ( x ) = 1 x > 0 0 x = 0 - 1 x < 0 sign ( x ) = 1 x > 0 0 x = 0 - 1 x < 0
(15)

Therefore the Jacobian of f(a)f(a) is given by

f ( a ) = p i = 0 L C i , 0 C i a - D i p - 1 sign ( C i a - D i ) p i = 0 L C i , M - 1 C i a - D i p - 1 sign ( C i a - D i ) f ( a ) = p i = 0 L C i , 0 C i a - D i p - 1 sign ( C i a - D i ) p i = 0 L C i , M - 1 C i a - D i p - 1 sign ( C i a - D i )
(16)

The Hessian of f(a)f(a) is the matrix 2f(a)2f(a) whose jmjm-th element (0j,mM0j,mM) is given by

j , m 2 f ( a ) = a 2 a j a m f ( a ) = a m a j f ( a ) = i = 0 L p C i , j a m D i - C i a p - 1 sign ( D i - C i a ) = i = 0 L α a m b ( a ) d ( a ) j , m 2 f ( a ) = a 2 a j a m f ( a ) = a m a j f ( a ) = i = 0 L p C i , j a m D i - C i a p - 1 sign ( D i - C i a ) = i = 0 L α a m b ( a ) d ( a )
(17)

where adequate substitutions have been made for the sake of simplicity. We have

a m b ( a ) = a m C i a - D i p - 1 = ( p - 1 ) C i , m C i a - D i p - 2 sign ( C i a - D i ) a m d ( a ) = a m sign ( D i - C i a ) = 0 a m b ( a ) = a m C i a - D i p - 1 = ( p - 1 ) C i , m C i a - D i p - 2 sign ( C i a - D i ) a m d ( a ) = a m sign ( D i - C i a ) = 0
(18)

Note that the partial derivative of d(a)d(a) at Di-Cia=0Di-Cia=0 is not defined. Therefore

a m b ( a ) d ( a ) = b ( a ) a m d ( a ) + d ( a ) a m b ( a ) = ( p - 1 ) C i , m C i a - D i p - 2 sign 2 ( C i a - D i ) a m b ( a ) d ( a ) = b ( a ) a m d ( a ) + d ( a ) a m b ( a ) = ( p - 1 ) C i , m C i a - D i p - 2 sign 2 ( C i a - D i )
(19)

Note that sign2(Cia-Di)=1sign2(Cia-Di)=1 for all Di-Cia0Di-Cia0 where it is not defined. Then

j , m 2 f ( a ) = p ( p - 1 ) i = 0 L C i , j C i , m C i a - D i p - 2 j , m 2 f ( a ) = p ( p - 1 ) i = 0 L C i , j C i , m C i a - D i p - 2
(20)

except at Di-Cia=0Di-Cia=0 where it is not defined.

Based on Equation 16 and Equation 20, one can apply Newton's method to problem Equation 8 as follows,

  • Given a0RM+1a0RM+1, DRL+1DRL+1, CRL+1×M+1CRL+1×M+1
  • For i=0,1,...i=0,1,...
    1. Find f(ai)f(ai).
    2. Find 2f(ai)2f(ai).
    3. Solve 2f(ai)s=-f(ai)2f(ai)s=-f(ai) for ss.
    4. Let a+=ai+sa+=ai+s.
    5. Check for convergence and iterate if necessary.

Note that for problem Equation 8 the Jacobian of f(a)f(a) can be written as

f ( a ) = p C T y f ( a ) = p C T y
(21)

where

y = C a i - D p - 1 sign ( C a i - D ) = C a i - D p - 2 ( C a i - D ) y = C a i - D p - 1 sign ( C a i - D ) = C a i - D p - 2 ( C a i - D )
(22)

Also,

j , m 2 f ( a ) = p ( p - 1 ) C j T Z C m j , m 2 f ( a ) = p ( p - 1 ) C j T Z C m
(23)

where

Z = diag C a i - D p - 2 Z = diag C a i - D p - 2
(24)

and

C j = C 0 , j C L , j C j = C 0 , j C L , j
(25)

Therefore

2 f ( a ) = ( p 2 - p ) C T Z C 2 f ( a ) = ( p 2 - p ) C T Z C
(26)

From Equation 26, the Hessian

 
2f(a)2f(a) can be expressed as

2 f ( a ) = ( p 2 - p ) C T W T W C 2 f ( a ) = ( p 2 - p ) C T W T W C
(27)

where

W = diag C a i - D p - 2 2 W = diag C a i - D p - 2 2
(28)

The matrix CR(L+1)×(M+1)CR(L+1)×(M+1) is given by

C = cos M ω 0 cos ( M - 1 ) ω 0 cos ( M - j ) ω 0 cos ω 0 1 cos M ω 1 cos ( M - 1 ) ω 1 cos ( M - j ) ω 1 cos ω 1 1 cos M ω i cos ( M - 1 ) ω i cos ( M - j ) ω i cos ω i 1 cos M ω L - 1 cos ( M - 1 ) ω L - 1 cos ( M - j ) ω L - 1 cos ω L - 1 1 cos M ω L cos ( M - 1 ) ω L cos ( M - j ) ω L cos ω L 1 C = cos M ω 0 cos ( M - 1 ) ω 0 cos ( M - j ) ω 0 cos ω 0 1 cos M ω 1 cos ( M - 1 ) ω 1 cos ( M - j ) ω 1 cos ω 1 1 cos M ω i cos ( M - 1 ) ω i cos ( M - j ) ω i cos ω i 1 cos M ω L - 1 cos ( M - 1 ) ω L - 1 cos ( M - j ) ω L - 1 cos ω L - 1 1 cos M ω L cos ( M - 1 ) ω L cos ( M - j ) ω L cos ω L 1
(29)

The matrix H=2f(a)H=2f(a) is positive definite (for p>1p>1). To see this, consider H=KTKH=KTK where K=WCK=WC. Let zRM+1zRM+1, z0z0. Then

z T H z = z T K T K z = K z 2 2 > 0 z T H z = z T K T K z = K z 2 2 > 0
(30)

unless zN(K)zN(K). But since WW is diagonal and CC is full column rank, N(K)=0N(K)=0 . Thus zTHz0zTHz0 (identity only if z=0z=0) and so HH is positive definite.

Newton's method and lplp complex linear systems

Consider the problem

min x e ( x ) = A x - b p p min x e ( x ) = A x - b p p
(31)

where ACm×nACm×n, xRnxRn and bCmbCm. One can write Equation 31 in terms of the real and imaginary parts of AA and bb,

e ( x ) = i = 1 m | A i x - b i | p = i = 1 m | Re { A i x - b i } + j I m { A i x - b i } | p = i = 1 m | ( R i x - α i ) + ( Z i x - γ i ) | p = i = 1 m ( R i x - α i ) 2 + ( Z i x - γ i ) 2 p = i = 1 m g i ( x ) p / 2 e ( x ) = i = 1 m | A i x - b i | p = i = 1 m | Re { A i x - b i } + j I m { A i x - b i } | p = i = 1 m | ( R i x - α i ) + ( Z i x - γ i ) | p = i = 1 m ( R i x - α i ) 2 + ( Z i x - γ i ) 2 p = i = 1 m g i ( x ) p / 2
(32)

where A=R+jZA=R+jZ and b=α+jγb=α+jγ. The gradient e(x)e(x) is the vector whose kk-th element is given by

x k e ( x ) = p 2 i = 1 m x k g i ( x ) g i ( x ) p - 2 2 = p 2 q k ( x ) g ^ ( x ) x k e ( x ) = p 2 i = 1 m x k g i ( x ) g i ( x ) p - 2 2 = p 2 q k ( x ) g ^ ( x )
(33)

where qkqk is the row vector whose ii-th element is

q k , i ( x ) = x k g i ( x ) = 2 ( R i x - α α i ) R i k + 2 ( Z i x - γ γ i ) Z i k = 2 R i k R i x + 2 Z i k Z i x - [ 2 α i R i k + 2 γ i Z i k ] q k , i ( x ) = x k g i ( x ) = 2 ( R i x - α α i ) R i k + 2 ( Z i x - γ γ i ) Z i k = 2 R i k R i x + 2 Z i k Z i x - [ 2 α i R i k + 2 γ i Z i k ]
(34)

Therefore one can express the gradient of e(x)e(x) by e(x)=p2Qg^e(x)=p2Qg^, where Q=[qk,i]Q=[qk,i] as above. Note that one can also write the gradient in vector form as follows

e ( x ) = p R T diag ( R x - α ) + Z T diag ( Z x - γ ) · ( R x - α ) 2 + ( Z x - γ ) 2 p - 2 2 e ( x ) = p R T diag ( R x - α ) + Z T diag ( Z x - γ ) · ( R x - α ) 2 + ( Z x - γ ) 2 p - 2 2
(35)

The Hessian H(x)H(x) is the matrix of second derivatives whose klkl-th entry is given by

H k , l ( x ) = 2 x k x l e ( x ) = x l p 2 i = 1 m q k , i ( x ) g i ( x ) p - 2 2 = p 2 i = 1 m q k , i ( x ) x l g i ( x ) p - 2 2 + g i ( x ) p - 2 2 x l q k , i ( x ) H k , l ( x ) = 2 x k x l e ( x ) = x l p 2 i = 1 m q k , i ( x ) g i ( x ) p - 2 2 = p 2 i = 1 m q k , i ( x ) x l g i ( x ) p - 2 2 + g i ( x ) p - 2 2 x l q k , i ( x )
(36)

Now,

x l g i ( x ) p - 2 p = p - 2 2 x l g i ( x ) g i ( x ) p - 4 2 = p - 2 2 q l , i ( x ) g i ( x ) p - 4 2 x l q k , i ( x ) = 2 R i k R i l + 2 Z i k Z i l x l g i ( x ) p - 2 p = p - 2 2 x l g i ( x ) g i ( x ) p - 4 2 = p - 2 2 q l , i ( x ) g i ( x ) p - 4 2 x l q k , i ( x ) = 2 R i k R i l + 2 Z i k Z i l
(37)

Substituting Equation 37 and (Reference) into Equation 36 we obtain

H k , l ( x ) = p ( p - 2 ) 4 i = 1 m q k , i ( x ) q l , i ( x ) g i ( x ) p - 4 4 + p i = 1 m ( R i k R i l + Z i k Z i l ) g i ( x ) p - 2 2 H k , l ( x ) = p ( p - 2 ) 4 i = 1 m q k , i ( x ) q l , i ( x ) g i ( x ) p - 4 4 + p i = 1 m ( R i k R i l + Z i k Z i l ) g i ( x ) p - 2 2
(38)

Note that H(x)H(x) can be written in matrix form as

H ( x ) = p ( p - 2 ) 4 Q diag g ( x ) p - 4 2 Q T + p R T diag g ( x ) p - 2 2 R + Z T diag g ( x ) p - 2 2 Z H ( x ) = p ( p - 2 ) 4 Q diag g ( x ) p - 4 2 Q T + p R T diag g ( x ) p - 2 2 R + Z T diag g ( x ) p - 2 2 Z
(39)

Therefore to solve Equation 31 one can use Newton's method as follows: given an initial point x0x0, each iteration gives a new estimate x+x+ according to the formulas

H ( x c ) s = - e ( x c ) x + = x c + s H ( x c ) s = - e ( x c ) x + = x c + s
(40)

where H(xc)H(xc) and e(xc)e(xc) correspond to the Hessian and gradient of e(x)e(x) as defined previously, evaluated at the current point xcxc. Since the pp-norm is convex for 1<p<1<p<, problem Equation 31 is convex. Therefore Newton's method will converge to the global minimizer xx as long as H(xc)H(xc) is not ill-conditioned.

Footnotes

  1. Note that
    lim u(a)0+aju(a)p= lim u(a)0-aju(a)p=0 lim u(a)0+aju(a)p= lim u(a)0-aju(a)p=0
    (14)

References

  1. Strang, G. (1998). Introduction to Linear Algebra. (2nd). Wellesley-Cambridge Press.
  2. Trefethen, Lloyd N. and III, David Bau. (1997). Numerical Linear Algebra. SIAM.

Collection Navigation

Content actions

Download:

Collection as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Module as:

PDF | More downloads ...

Add:

Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks