# Connexions

You are here: Home » Content » Iterative Design of l_p Digital Filters » Least squares design of IIR filters

### Recently Viewed

This feature requires Javascript to be enabled.

Inside Collection:

Collection by: Ricardo Vargas. E-mail the author

# Least squares design of IIR filters

Module by: Ricardo Vargas. E-mail the author

(Reference) introduced the IIR least squares design problem, as illustrated in (Reference). Such problem cannot be solved in the same manner as in the FIR case; therefore more sophisticated methods must be employed. As will be discussed later in (Reference), some tradeoffs are desirable for lplp optimization. As in the case of FIR design, when designing lplp IIR filters one must use l2l2 methods as internal steps over which one iterates while moving between diferent values of pp. Clearly this internal iteration must not be too demanding computationally since an outer lplp loop will invoke it repeatedly (this process will be further illustrated in (Reference)). With this issue in mind, one needs to select an l2l2 algorithm that remains accurate within reasonable error bounds while remaining computationally efficient.

This section begins by summarizing some of the traditional approaches that have been employed for l2l2 rational approximation, both within and outside filter design applications. Amongst the several existing traditional nonlinear optimization approaches, the Davidon-Fletcher-Powell (DFP) and the Gauss-Newton methods have been often used and remain relatively well understood in the filter design community. A brief introduction to both methods is presented in Section 1, and their caveats briefly explored.

An alternative to attacking a complex nonlinear problem like (Reference) with general nonlinear optimization tools consists in linearization, an attempt to "linearize" a nonlinear problem and to solve it by using linear optimization tools. Multiple efforts have been applied to similar problems in different areas of statistics and systems analysis and design. Section 2 introduces the notion of an Equation Error, a linear expression related to the actual Solution Error that one is interested in minimizing in l2l2 design. The equation error formulation is nonetheles important for a number of filter design methods (including the ones presented in this work) such as Levy's method, one of the earliest and most relevant frequency domain linearization approaches. Section 4 presents a frequency domain equation error algorithm based on the methods by Prony and Padé. This algorithm illustrates the usefulness of the equation error formulation as it is fundamental to the implementation of the methods proposed later in this work (in (Reference)).

An important class of linearization methods fall under the name of iterative prefiltering algorithms, presented in Section 8. The Sanathanan-Koerner (SK) algorithm and the Steiglitz-McBride (SMB) methods are well known and commonly used examples in this category, and their strengths and weaknesses are explored. Another recent development in this area is the method by Jackson, also presented in this section. Finally, Soewito's quasilinearization (the method of choice for least squares IIR approximation in this work) is presented in Section 13.

One way to adress (Reference) is to attempt to solve it with general nonlinear optimization tools. One of the most typical approach in nonlinear optimization is to apply either Newton's method or a Newton-based algorithm. One assumption of Newton's method is that the optimization function resembles a quadratic function near the solution (refer to Appendix (Reference) for more information). In order to update a current estimate, Newton's method requires first and second order information through the use of gradient and Hessian matrices. A quasi-Newton method is one that estimates in a certain way the second order information based on gradients (by generalizing the secant method to multiple dimensions).

One of the most commonly used quasi-Newton methods in IIR filter design is the Davidon-Fletcher-Powell (DFP) method [10]. In 1970 K. Steiglitz [33] used the DFP method to solve an IIR magnitude approximation to a desired real frequency response. For stability concerns he used a cascade form of the IIR filter given in (Reference) through

H ( z ) = α r = 1 M 1 + a r z - 1 + b r z - 2 1 + c r z - 1 + d r z - 2 H ( z ) = α r = 1 M 1 + a r z - 1 + b r z - 2 1 + c r z - 1 + d r z - 2
(1)

Therefore he considered the following problem,

min a n , b n , c n , d n ω k α r = 1 M 1 + a r e - j ω k + b r e - 2 j ω k 1 + c r e - j ω k + d r e - 2 j ω k - D ( ω k ) 2 min a n , b n , c n , d n ω k α r = 1 M 1 + a r e - j ω k + b r e - 2 j ω k 1 + c r e - j ω k + d r e - 2 j ω k - D ( ω k ) 2
(2)

His method is a direct implementation of the DFP algorithm in problem Equation 2.

In 1972 Andrew Deczky [6] employed the DFP algorithm to solve a complex IIR least-pp approximation to a desired frequency response. Like Steiglitz, Deczky chose to employ the cascaded IIR structure of Equation 1, mainly for stability reasons but also because he claims that for this structure it is simpler to derive the first order information required for the DFP method.

The MATLAB Signal Processing Toolbox includes a function called INVFREQZ, originally written by J. Smith and J. Little [34]. Invfreqz uses the algorithm by Levy (see §Section 3) as an initial step and then begins an iterative algorithm based on the damped Gauss-Newton [7] to minimize the solution error εsεs according to the least-squared error criteria. This method performs a line search after every iteration to find the optimal direction for the next step. Invfreqz evaluates the roots of A(z)A(z) after each iteration to verify that the poles of H(z)H(z) lie inside the unit circle; otherwise it will convert the pole into its reciprocal. This approach guarantees a stable filter.

Among other Newton-based approaches, Spanos and Mingori [29] use a Newton algorithm combined with the Levenberg-Marquardt technique to improve the algorithm's convergence properties. Their idea is to express the denominator function A(ω)A(ω) as a sum of second-order rational polynomials. Thus H(ω)H(ω) can be written as

H ( ω ) = r = 1 L - 1 b r + j ω β r a r + j ω β r - ω 2 + d H ( ω ) = r = 1 L - 1 b r + j ω β r a r + j ω β r - ω 2 + d
(3)

Their global descent approach is similar to the one presented in [31]. As any Newton-based method, this approach suffers under a poor initial guess, and does not guarantee to converge (if convergence occurs) to a local minimum. However, in such case, convergence is quadratic.

Kumaresan's method [16] considers a three-step approach. It is not clear whether his method attempts to minimize the equation error or the solution error. He uses divided differences [13] to reformulate the solution error in terms of the coefficients akak. Using Lagrange multiplier theory, he defines

E = y T C T [ C C T ] - 1 C y E = y T C T [ C C T ] - 1 C y
(4)

where y=[H0H1HL-1]Ty=[H0H1HL-1]T contains the frequency samples and CC is a composition matrix containing the frequency divided differences and the coefficients akak (a more detailed derivation can be found in [17]). Equation Equation 4 is iterated until convergence of the coefficient vector a^a^ is reached. This vector is used as initial guess in the second step, involving a Newton-Raphson search of the optimal a^a^ that minimizes E2E2. Finally the vector b^b^ is found by solving a linear system of equations.

## Equation error linearization methods

Typically general use optimization tools prove effective in finding a solution. However in the context of IIR filter design, they often tend to take a rather large number of iterations, generate large matrices or require complicated steps like solving or estimating (and often inverting) vectors and matrices of first and second order information [30]. Using gradient-based tools for nonlinear problems like (Reference) certainly seems like a suboptimal approach. Also, typical Newton-based methods tend to converge quick (quadratically), yet they make assumptions about radii of convergence and initial proximity to the solution (otherwise performance is suboptimal). In the context of filter design one should wonder if better performance could be achieved by exploiting characteristics from the problem. This section introduces the concept of linearization, an alternative to general optimization methods that has proven successful in the context of rational approximation. The main idea behind linearization approaches consists in transforming a complex nonlinear problem into a sequence of linear ones, an idea that is parallel to the approach followed in our development of IRLS lplp optimization.

A common notion used in this work (as well as some of the literature related to linearization and filter design) is that there are two different error measures that authors often refer to. It is important to recognize the differences between them as one browses through literature. Typically one would be interested in minimizing the l2l2 error given by:

ε = E ( ω ) 2 2 = D ( ω ) - B ( ω ) A ( ω ) 2 2 ε = E ( ω ) 2 2 = D ( ω ) - B ( ω ) A ( ω ) 2 2
(5)

This quantity is often referred to as the solution error (denoted by εsεs); we refer to the function E(ω)E(ω) in Equation 5 as the solution error function, denoted by Es(ω)Es(ω). Also, in linearization algorithms the following measure often arises,

ε = E ( ω ) 2 2 = A ( ω ) D ( ω ) - B ( ω ) 2 2 ε = E ( ω ) 2 2 = A ( ω ) D ( ω ) - B ( ω ) 2 2
(6)

This measure is often referred to as the equation error εeεe; we denote the function E(ω)E(ω) in Equation 6 as the equation error functionEe(ω)Ee(ω). Keeping the notation previously introduced, it can be seen that the two errors relate by one being a weighted version of the other,

E e ( ω ) = A ( ω ) E s ( ω ) E e ( ω ) = A ( ω ) E s ( ω )
(7)

### Levy's method

E. C. Levy [19] considered in 1959 the following problem in the context of analog systems (electrical networks to be more precise): define1

H ( j ω ) = B 0 + B 1 ( j ω ) + B 2 ( j ω ) 2 + A 0 + A 1 ( j ω ) + A 2 ( j ω ) 2 + = B ( ω ) A ( ω ) H ( j ω ) = B 0 + B 1 ( j ω ) + B 2 ( j ω ) 2 + A 0 + A 1 ( j ω ) + A 2 ( j ω ) 2 + = B ( ω ) A ( ω )
(8)

Given LL samples of a desired complex-valued function D(jωk)=R(ωk)+jI(ωk)D(jωk)=R(ωk)+jI(ωk) (where R,IR,I are both real funtions of ωω), Levy defines

E ( ω ) = D ( j ω ) - H ( j ω ) = D ( j ω ) - B ( ω ) A ( ω ) E ( ω ) = D ( j ω ) - H ( j ω ) = D ( j ω ) - B ( ω ) A ( ω )
(9)

or

ε = k = 0 L | E ( ω k ) | 2 = k = 0 L | A ( ω k ) D ( j ω k ) - B ( ω k ) | 2 ε = k = 0 L | E ( ω k ) | 2 = k = 0 L | A ( ω k ) D ( j ω k ) - B ( ω k ) | 2
(10)

Observing the linear structure (in the coefficients Ak,BkAk,Bk) of equation Equation 10, Levy proposed minimizing the quantity εε. He actually realized that this measure (what we would denote as the equation error) was indeed a weighted version of the actual solution error that one might be interested in; in fact, the denominator function A(ω)A(ω) became the weighting function.

Levy's proposed method for minimizing Equation 10 begins by writing εε as follows,

ε = k = 0 L ( R k σ k - ω k τ k I k - α k ) 2 + ( ω k τ k R k + σ k I k - ω k β k ) 2 ε = k = 0 L ( R k σ k - ω k τ k I k - α k ) 2 + ( ω k τ k R k + σ k I k - ω k β k ) 2
(11)

by recognizing that Equation 8 can be reformulated in terms of its real and imaginary parts,

H ( j ω ) = ( B 0 - B 2 ω 2 + B 4 ω 4 ) + j ω ( B 1 - B 3 ω 2 + B 5 ω 4 ) ( A 0 - A 2 ω 2 + A 4 ω 4 ) + j ω ( A 1 - A 3 ω 2 + A 5 ω 4 ) = α + j ω β σ + j ω τ H ( j ω ) = ( B 0 - B 2 ω 2 + B 4 ω 4 ) + j ω ( B 1 - B 3 ω 2 + B 5 ω 4 ) ( A 0 - A 2 ω 2 + A 4 ω 4 ) + j ω ( A 1 - A 3 ω 2 + A 5 ω 4 ) = α + j ω β σ + j ω τ
(12)

and performing appropriate manipulations2. Note that the optimal set of coefficients AkAk, BkBk must satisfy

ε A 0 = ε A 1 = ... = ε B 0 = ... = 0 ε A 0 = ε A 1 = ... = ε B 0 = ... = 0
(13)

The conditions introduced above generate a linear system in the filter coefficients. Levy derives the system

C x = y C x = y
(14)

where

C = λ 0 0 - λ 2 0 λ 4 T 1 S 2 - T 3 - S 4 T 5 0 λ 2 0 - λ 4 0 - S 2 T 3 S 4 - T 5 - S 6 λ 2 0 - λ 4 0 λ 6 T 3 S 4 - T 5 - S 6 T 7 T 1 - S 2 - T 3 S 4 T 5 U 2 0 - U 4 0 U 6 S 2 T 3 - S 4 - T 5 S 6 0 U 4 0 - U 6 0 T 3 - S 4 - T 5 S 6 T 7 U 4 0 - U 6 0 U 8 C = λ 0 0 - λ 2 0 λ 4 T 1 S 2 - T 3 - S 4 T 5 0 λ 2 0 - λ 4 0 - S 2 T 3 S 4 - T 5 - S 6 λ 2 0 - λ 4 0 λ 6 T 3 S 4 - T 5 - S 6 T 7 T 1 - S 2 - T 3 S 4 T 5 U 2 0 - U 4 0 U 6 S 2 T 3 - S 4 - T 5 S 6 0 U 4 0 - U 6 0 T 3 - S 4 - T 5 S 6 T 7 U 4 0 - U 6 0 U 8
(15)

and

x = B 0 B 1 B 2 A 1 A 2 y = S 0 T 1 S 2 T 3 0 U 2 0 U 4 x = B 0 B 1 B 2 A 1 A 2 y = S 0 T 1 S 2 T 3 0 U 2 0 U 4
(16)

with

λ h = l = 0 L - 1 ω l h S h = l = 0 L - 1 ω l h R l T h = l = 0 L - 1 ω l h I l U h = l = 0 L - 1 ω l h ( R l 2 + I l 2 ) λ h = l = 0 L - 1 ω l h S h = l = 0 L - 1 ω l h R l T h = l = 0 L - 1 ω l h I l U h = l = 0 L - 1 ω l h ( R l 2 + I l 2 )
(17)

Solving for the vector xx from Equation 14 gives the desired coefficients (note the trivial assumption that A0=1A0=1). It is important to remember that although Levy's algorithm leads to a linear system of equations in the coefficients, his approach is indeed an equation error method. Matlab's invfreqz function uses an adaptation of Levy's algorithm for its least-squares equation error solution.

## Prony-based equation error linearization

A number of algorithms that consider the approximation of functions in a least-squared sense using rational functions relate to Prony's method. This section summarizes these methods especially in the context of filter design.

### Prony's method

The first method considered in this section is due to Gaspard Riche Baron de Prony, a Lyonnais mathematician and physicist which, in 1795, proposed to model the expansion properties of different gases by sums of damped exponentials. His method [5] approximates a sampled function f(n)f(n) (where f(n)=0f(n)=0 for n<0n<0) with a sum of NN exponentials,

f ( n ) = k = 1 N c k e s k n = k = 1 N c k λ k n f ( n ) = k = 1 N c k e s k n = k = 1 N c k λ k n
(18)

where λk=eskλk=esk. The objective is to determine the NN parameters ckck and the NN parameters sksk in Equation 18 given 2N2N samples of f(n)f(n).

It is possible to express Equation 18 in matrix form as follows,

1 1 1 λ 1 λ 2 λ N λ 1 N - 1 λ 2 N - 1 λ N N - 1 c 1 c 2 c N = f ( 0 ) f ( 1 ) f ( N - 1 ) 1 1 1 λ 1 λ 2 λ N λ 1 N - 1 λ 2 N - 1 λ N N - 1 c 1 c 2 c N = f ( 0 ) f ( 1 ) f ( N - 1 )
(19)

System Equation 19 has a Vandermonde structure with NN equations, but 2N2N unknowns (both ckck and λkλk are unknown) and thus it cannot be solved directly. Yet the major contribution of Prony's work is to recognize that f(n)f(n) as given in Equation 18 is indeed the solution of a homogeneous order-NN Linear Constant Coefficient Difference Equation (LCCDE) [22] given by

p = 0 N a p f ( m - p ) = 0 p = 0 N a p f ( m - p ) = 0
(20)

with a0=1a0=1. Since f(n)f(n) is known for 0n2N-10n2N-1, we can extend Equation 20 into an (N×NN×N) system of the form

f ( N - 1 ) f ( N - 2 ) f ( 0 ) f ( N ) f ( N - 1 ) f ( 1 ) f ( 2 N - 2 ) f ( 2 N - 3 ) f ( N - 1 ) a 1 a 2 a N = - f ( N ) - f ( N + 1 ) - f ( 2 N - 1 ) f ( N - 1 ) f ( N - 2 ) f ( 0 ) f ( N ) f ( N - 1 ) f ( 1 ) f ( 2 N - 2 ) f ( 2 N - 3 ) f ( N - 1 ) a 1 a 2 a N = - f ( N ) - f ( N + 1 ) - f ( 2 N - 1 )
(21)

which we can solve for the coefficients apap. Such coefficients are then used in the characteristic equation[8] of Equation 20,

λ N + a 1 λ N - 1 + + a N - 1 λ + a N = 0 λ N + a 1 λ N - 1 + + a N - 1 λ + a N = 0
(22)

The NN roots λkλk of Equation 21 are called the characteristic roots of Equation 20. From the λkλk we can find the parameters sksk using sk=lnλksk=lnλk. Finally, it is now possible to solve Equation 19 for the parameters ckck.

The method described above is an adequate representation of Prony's original method [5]. More detailed analysis is presented in [15], [2], [35], [3] and [20]. Prony's method is an adequate algorithm for interpolating 2N2N data samples with NN exponentials. Yet it is not a filter design algorithm as it stands. Its connection with IIR filter design, however, exists and will be discussed in the following sections.

The work by Prony served as inspiration to Henry Padé, a French mathematician which in 1892 published a work [25] discussing the problem of rational approximation. His objective was to approximate a function that could be represented by a power series expansion using a rational function of two polynomials.

Assume that a function f(x)f(x) can be represented with a power series expansion of the form

f ( x ) = k = 0 c k x k f ( x ) = k = 0 c k x k
(23)

Padé's idea was to approximate f(x)f(x) using the function

f ^ ( x ) = B ( x ) A ( x ) f ^ ( x ) = B ( x ) A ( x )
(24)

where

B ( x ) = k = 0 M b k x k B ( x ) = k = 0 M b k x k
(25)

and

A ( x ) = 1 + k = 1 N a k x k A ( x ) = 1 + k = 1 N a k x k
(26)

The objective is to determine the coefficients akak and bkbk so that the first M+N+1M+N+1 terms of the residual

r ( x ) = A ( x ) f ( x ) - B ( x ) r ( x ) = A ( x ) f ( x ) - B ( x )
(27)

dissappear (i.e. the first N+MN+M derivatives of f(x)f(x) and f^(x)f^(x) are equal [12]). That is, [4],

r ( x ) = A ( x ) k = 0 c k x k - B ( x ) = x M + N + 1 k = 0 d k x k r ( x ) = A ( x ) k = 0 c k x k - B ( x ) = x M + N + 1 k = 0 d k x k
(28)

To do this, consider A(x)f(x)=B(x)A(x)f(x)=B(x) [35]

( 1 + a 1 x + + a N x N ) · ( c 0 + c 1 x + + c i x i + ) = b 0 + b 1 x + + b M x M ( 1 + a 1 x + + a N x N ) · ( c 0 + c 1 x + + c i x i + ) = b 0 + b 1 x + + b M x M
(29)

By equating the terms with same exponent up to order M+N+1M+N+1, we obtain two sets of equations,

c 0 = b 0 a 1 c 0 + c 1 = b 1 a 2 c 0 + a 1 c 1 + c 2 = b 2 a 3 c 0 + a 2 c 1 + a 1 c 2 + c 3 = b 3 a N c M - N + a N - 1 c M - N + 1 + + c M = b M c 0 = b 0 a 1 c 0 + c 1 = b 1 a 2 c 0 + a 1 c 1 + c 2 = b 2 a 3 c 0 + a 2 c 1 + a 1 c 2 + c 3 = b 3 a N c M - N + a N - 1 c M - N + 1 + + c M = b M
(30)
a N c M - N + 1 + a N - 1 c M - N + 2 + + c M + 1 = 0 a N c M - N + 2 + a N - 1 c M - N + 3 + + c M + 2 = 0 a N c M + a N - 1 c M + 1 + + c M + N = 0 a N c M - N + 1 + a N - 1 c M - N + 2 + + c M + 1 = 0 a N c M - N + 2 + a N - 1 c M - N + 3 + + c M + 2 = 0 a N c M + a N - 1 c M + 1 + + c M + N = 0
(31)

Equation Equation 31 represents an N×NN×N system that can be solved for the coefficients akak given c(n)c(n) for 0nN+M0nN+M. These values can then be used in Equation 30 to solve for the coefficients bkbk. The result is a system whose impulse response matches the first N+M+1N+M+1 values of f(n)f(n).

### Prony-based filter design methods

Both the original methods by Prony and Pade were meant to interpolate data from applications that have little in common with filter design. What is relevant to this work is their use of rational functions of polynomials as models for data, and the linearization process they both employ.

When designing FIR filters, a common approach is to take LL samples of the desired frequency response D(ω)D(ω) and calculate the inverse DFT of the samples. This design approach is known as frequency sampling. It has been shown [26] that by designing a length-LL filter h(n)h(n) via the frequency sampling method and symmetrically truncating h(n)h(n) to NN values (NLNL) it is possible to obtain a least-squares optimal length-NN filter hN(n)hN(n). It is not possible however to extend completely this method to the IIR problem. This section presents an extension based on the methods by Prony and Pade, and illustrates the shortcomings of its application.

Consider the frequency response defined in (Reference). One can choose LL equally spaced samples of H(ω)H(ω) to obtain

H ( ω k ) = H k = B k A k for k = 0 , 1 , ... , L - 1 H ( ω k ) = H k = B k A k for k = 0 , 1 , ... , L - 1
(32)

where AkAk and BkBk represent the length-LL DFTs of the filter coefficients anan and bnbn respectively. The division in Equation 32 is done point-by-point over the LL values of AkAk and BkBk. The objective is to use the relationship in described in Equation 32 to calculate anan and bnbn.

One can express Equation 32 as Bk=HkAkBk=HkAk. This operation represents the length-LL circular convolution b(n)=h(n)La(n)b(n)=h(n)La(n) defined as follows [24]

b ( n ) = h ( n ) L a ( n ) = m = 0 L - 1 h ( ( n - m ) ) L a ( m ) , 0 n L - 1 b ( n ) = h ( n ) L a ( n ) = m = 0 L - 1 h ( ( n - m ) ) L a ( m ) , 0 n L - 1
(33)

where h(n)h(n) is the length-LL inverse DFT of HkHk and the operator ((·))L((·))L represents modulo LL. Let

a ^ = 1 a 1 a N 0 0 and b ^ = b 0 b 1 b M 0 0 a ^ = 1 a 1 a N 0 0 and b ^ = b 0 b 1 b M 0 0
(34)

Therefore Equation 33 can be posed as a matrix operation [26] of the form

H ^ a ^ = b ^ H ^ a ^ = b ^
(35)

where

(36)

is an L×LL×L matrix. From Equation 34 it is clear that the L-(N+1)L-(N+1) rightmost columns of H^H^ can be discarded (since the last L-(N+1)L-(N+1) values of a^a^ in Equation 34 are equal to 0). Therefore equation Equation 35 can be rewritten as

h 0 h L - 1 h L - N h 1 h 0 h L - N + 1 h M h M - 1 h ( ( L - N + M ) ) L h M + 1 h M h ( ( L - N + M + 1 ) ) L h L - 2 h L - 3 h L - N - 2 h L - 1 h L - 2 h L - N - 1 1 a 1 a N = b 0 b 1 b M 0 0 0 h 0 h L - 1 h L - N h 1 h 0 h L - N + 1 h M h M - 1 h ( ( L - N + M ) ) L h M + 1 h M h ( ( L - N + M + 1 ) ) L h L - 2 h L - 3 h L - N - 2 h L - 1 h L - 2 h L - N - 1 1 a 1 a N = b 0 b 1 b M 0 0 0
(37)

or in matrix notation

H 1 a = b 0 or H a ˜ = b ^ H 1 a = b 0 or H a ˜ = b ^
(38)

where aa and bb correspond to the length-NN and (M+1)(M+1) filter coefficient vectors respectively and HH contains the first N+1N+1 columns of H^H^. It is possible to uncouple the calculation of aa and bb from Equation 38 by breaking HH furthermore as follows,

(39)

Therefore

H 1 H 2 a ˜ = b 0 H 1 H 2 a ˜ = b 0
(40)

with

a ˜ = 1 a a ˜ = 1 a
(41)

as defined in Equation 38. This formulation allows to uncouple the calculations for aa and bb using two systems,

H 1 a ˜ = b H 2 a ˜ = 0 H 1 a ˜ = b H 2 a ˜ = 0
(42)

Note that the last equation can be expressed as

H ^ 2 a = - h ^ 2 H ^ 2 a = - h ^ 2
(43)

where H2=[h^2H^2]H2=[h^2H^2] (that is, h^2h^2 and H^2H^2 contain the first and second through NN-th columns of H^2H^2 respectively).

From Equation 43 one can conclude that if L=N+M+1L=N+M+1 and if H^2H^2 and H1H1 are nonsingular, then they can be inverted3 to solve for the filter coefficient vectors aa in Equation 43 and solve for bb using H1a˜=bH1a˜=b.

The algorithm described above is an interpolation method rather than an approximation one. If L>N+M+1L>N+M+1 and H^2H^2 is full column rank then Equation 43 is an overdetermined linear system for which no exact solution exists; therefore an approximation must be found. From Equation 32 we can define the solution error function Es(ωk)Es(ωk) as

E s ( ω k ) = B ( ω k ) A ( ω k ) - H ( ω k ) E s ( ω k ) = B ( ω k ) A ( ω k ) - H ( ω k )
(44)

Using this notation, the design objective is to solve the nonlinear problem

min a , b E s ( ω k ) 2 2 min a , b E s ( ω k ) 2 2
(45)

Consider the system in equation Equation 38. If H2H2 is overdetermined, one can define an approximation problem by introducing an error vector ee,

b ^ = H a ˜ - e b ^ = H a ˜ - e
(46)

where

e = e 1 e 2 e = e 1 e 2
(47)

Again, it is possible to uncouple Equation 46 as follows,

b = H 1 a ˜ - e 1 e 2 = h ^ 2 + H ^ 2 a b = H 1 a ˜ - e 1 e 2 = h ^ 2 + H ^ 2 a
(48)

One can minimize the least-squared error norm e22e22 of the overdetermined system (Reference) by solving the normal equations [7]

H ^ 2 T h ^ 2 = - H ^ 2 T H ^ 2 a H ^ 2 T h ^ 2 = - H ^ 2 T H ^ 2 a
(49)

so that

a = - [ H ^ 2 T H ^ 2 ] - 1 H ^ 2 T h ^ 2 a = - [ H ^ 2 T H ^ 2 ] - 1 H ^ 2 T h ^ 2
(50)

and use this result in Equation 48

b = H 1 a ˜ b = H 1 a ˜
(51)

Equation 48 represents the following time-domain operation,

ε ( n ) = b ( n ) - h ( n ) L a ( n ) ^ 0 n M ε ( n ) = b ( n ) - h ( n ) L a ( n ) ^ 0 n M
(52)

(where LL denotes circular convolution) and can be interpreted in the frequency domain as follows,

E e ( ω k ) = B ( ω k ) - H ( ω k ) A ( ω k ) E e ( ω k ) = B ( ω k ) - H ( ω k ) A ( ω k )
(53)

Equation Equation 53 is a weighted version of Equation 44, as follows

E e ( ω k ) = A ( ω k ) E s ( ω k ) E e ( ω k ) = A ( ω k ) E s ( ω k )
(54)

Therefore the algorithm presented above will find the filter coefficient vectors aa and bb that minimize the equation errorEeEe in Equation 53 in the least-squares sense. Unfortunately, this error is not what one may want to optimize, since it is a weighted version of the solution error EsEs.

## Iterative prefiltering linearization methods

Section 2 introduced the equation error formulation and several algorithms that minimize it. In a general sense however one is more interested in minimizing the solution error problem from (Reference). This section presents several algorithms that attempt to minimize the solution error formulation from (Reference) by prefiltering the desired response D(ω)D(ω) in Equation 6 with A(ω)A(ω). Then a new set of coefficients {an,bn}{an,bn} are found with an equation error formulation and the prefiltering step is repeated, hence defining an iterative procedure.

### Sanathanan-Koerner (SK) method

The method by Levy presented in Section 3 suggests a relatively easy-to-implement approach to the problem of rational approximation. While interesting in itself, the equation error εeεe does not really represent what in principle one would like to minimize. A natural extension to Levy's method is the one proposed [27] by C. K. Sanathanan and J. Koerner in 1963. The algorithm iteratively prefilters the equation error formulation of Levy with an estimate of A(ω)A(ω). The SK method considers the solution error function EsEs defined by

E s ( ω ) = D ( ω ) - B ( ω ) A ( ω ) = 1 A ( ω ) A ( ω ) D ( ω ) - B ( ω ) = 1 A ( ω ) E e ( ω ) E s ( ω ) = D ( ω ) - B ( ω ) A ( ω ) = 1 A ( ω ) A ( ω ) D ( ω ) - B ( ω ) = 1 A ( ω ) E e ( ω )
(55)

Then the solution error problem can be written as

min a k , b k ε s min a k , b k ε s
(56)

where

ε s = k = 0 L | E s ( ω k ) | 2 = k = 0 L 1 | A ( ω ) | 2 | E e ( ω k ) | 2 = W ( ω ) | E e ( ω k ) | 2 ε s = k = 0 L | E s ( ω k ) | 2 = k = 0 L 1 | A ( ω ) | 2 | E e ( ω k ) | 2 = W ( ω ) | E e ( ω k ) | 2
(57)

Note that given A(ω)A(ω), one can obtain an estimate for B(ω)B(ω) by minimizing EeEe as Levy did. This approach provides an estimate, though, because one would need to know the optimal value of A(ω)A(ω) to truly optimize for B(ω)B(ω). The idea behind this method is that by solving iteratively for A(ω)A(ω) and B(ω)B(ω) the algorithm would eventually converge to the solution of the desired solution error problem defined by Equation 56. Since A(ω)A(ω) is not known from the beginning, it must be initialized with a reasonable value (such as A(ωk)=1A(ωk)=1).

To solve Equation 56 Sanathanan and Koerner defined the same linear system from Equation 14 with the same matrix and vector definitions. However the scalar terms used in the matrix and vectors reflect the presence of the weighting function W(ω)W(ω) in εsεs as follows,

λ h = l = 0 L - 1 ω l h W ( ω l ) S h = l = 0 L - 1 ω l h R l W ( ω l ) T h = l = 0 L - 1 ω l h I l W ( ω l ) U h = l = 0 L - 1 ω l h ( R l 2 + I l 2 ) W ( ω l ) λ h = l = 0 L - 1 ω l h W ( ω l ) S h = l = 0 L - 1 ω l h R l W ( ω l ) T h = l = 0 L - 1 ω l h I l W ( ω l ) U h = l = 0 L - 1 ω l h ( R l 2 + I l 2 ) W ( ω l )
(58)

Then, given an initial definition of A(ω)A(ω), at the pp-th iteration one sets

W ( ω ) = 1 A p - 1 ( ω k ) 2 W ( ω ) = 1 A p - 1 ( ω k ) 2
(59)

and solves Equation 14 using {λ,S,T,U}{λ,S,T,U} as defined above until a convergence criterion is reached. Clearly, solving Equation 56 using Equation 57 is equivalent to solving a series of weighted least squares problems where the weighting function consists of the estimated values of A(ω)A(ω) from the previous iteration. This method is similar to a time-domain method proposed by Steiglitz and McBride [28], presented later in this chapter.

### Method of Sid-Ahmed, Chottera and Jullien

The methods by Levy and Sanathanan and Koerner did arise from an analog analysis problem formulation, and cannot therefore be used directly to design digital filters. However these two methods present important ideas that can be translated to the context of filter design. In 1978 M. Sid-Ahmed, A. Chottera and G. Jullien followed on these two important works and adapted [21] the matrix and vectors used by Levy to account for the design of IIR digital filters, given samples of a desired frequency response. Consider the frequency response H(ω)H(ω) defined in (Reference). In parallel with Levy's development, the corresponding equation error can be written as

ε e = k = 0 L ( R k + j I k ) 1 + c = 1 N a i e - j ω k c - c = 0 M b i e - j ω k c 2 ε e = k = 0 L ( R k + j I k ) 1 + c = 1 N a i e - j ω k c - c = 0 M b i e - j ω k c 2
(60)

One can follow a similar differentiation step as Levy by setting

ε e a 1 = ε e a 2 = ... = ε e b 0 = ... = 0 ε e a 1 = ε e a 2 = ... = ε e b 0 = ... = 0
(61)

with as defined in Equation 60. Doing so results in a linear system of the form

C x = y C x = y
(62)

where the vectors xx and yy are given by

x = b 0 b M a 1 a N y = φ 0 - r 0 φ M - r M - β 1 - β N x = b 0 b M a 1 a N y = φ 0 - r 0 φ M - r M - β 1 - β N
(63)

The matrix CC has a special structure given by

C = Ψ Φ Φ T Υ C = Ψ Φ Φ T Υ
(64)

where ΨΨ and ΥΥ are symmetric Toeplitz matrices of order M+1M+1 and NN respectively, and their first row is given by

Ψ 1 m = η m - 1 for m = 1 , ... , M + 1 Υ 1 m = β m - 1 for m = 1 , ... , N Ψ 1 m = η m - 1 for m = 1 , ... , M + 1 Υ 1 m = β m - 1 for m = 1 , ... , N
(65)

Matrix ΦΦ has order M+1×NM+1×N and has the property that elements on a given diagonal are identical (i.e. Φi,j=Φi+1,j+1Φi,j=Φi+1,j+1). Its entries are given by

Φ 1 m = φ m + r m for m = 1 , ... , N Φ m 1 = φ m - 2 - r m - 2 for m = 2 , ... , M + 1 Φ 1 m = φ m + r m for m = 1 , ... , N Φ m 1 = φ m - 2 - r m - 2 for m = 2 , ... , M + 1
(66)

The parameters {η,φ,r,β}{η,φ,r,β} are given by

η i = k = 0 L cos i ω k for 0 i M β i = k = 0 L | D ( ω k ) | 2 cos i ω k for 0 i N - 1 φ i = k = 0 L R k cos i ω k for 0 i max ( N , M - 1 ) r i = k = 0 L I k sin i ω k for 0 i max ( N , M - 1 ) η i = k = 0 L cos i ω k for 0 i M β i = k = 0 L | D ( ω k ) | 2 cos i ω k for 0 i N - 1 φ i = k = 0 L R k cos i ω k for 0 i max ( N , M - 1 ) r i = k = 0 L I k sin i ω k for 0 i max ( N , M - 1 )
(67)

The rest of the algorithm works the same way as Levy's. For a solution error approach, one must weight each of the parameters mentioned above with the factor from Equation 59 as in the SK method.

There are two important details worth mentioning at this point: on one hand the methods discussed up to this point (Levy, SK and Sid-Ahmed et al.) do not put any limitation on the spacing of the frequency samples; one can sample as fine or as coarse as desired in the frequency domain. On the other hand there is no way to decouple the solution of both numerator and denominator vectors. In other words, from Equation 16 and Equation 63 one can see that the linear systems that solve for vector xx solve for all the variables in it. This is more of an issue for the iterative methods (SK & Sid-Ahmed), since at each iteration one solves for all the variables, but for the purposes of updating one needs only to keep the denominator variables (they get used in the weighting function); the numerator variables are never used within an iteration (in contrast to Burrus' Prony-based method presented in Section 4). This approach decouples the numerator and denominator computation into two separate linear systems. One only needs to compute the denominator variables until convergence is reached, and only then it becomes necessary to compute the numerator variables. Therefore most of the iterations solve a smaller linear system than the methods involved up to this point.

### Steiglitz-McBride iterative algorithm

In 1965 K. Steiglitz and L. McBride presented an algorithm [28], [18] that has become quite popular in statistics and engineering applications. The Steiglitz-McBride method (SMB) considers the problem of deriving a transfer function for either an analog or digital system from their input and output data; in essence it is a time-domain method. Therefore it is mentioned in this work for completeness as it closely relates to the methods by Levy, SK and Sid-Ahmed, yet it is far better known and understood.

The derivation of the SMB method follows closely that of SK. In the Z-domain, the transfer function of a digital system is defined by

H ( z ) = B ( z ) A ( z ) = b 0 + b 1 z - 1 + ... + b N z - N 1 + a 1 z - 1 + ... + a N z - N H ( z ) = B ( z ) A ( z ) = b 0 + b 1 z - 1 + ... + b N z - N 1 + a 1 z - 1 + ... + a N z - N
(68)

Furthermore

Y ( z ) = H ( z ) X ( z ) = B ( z ) A ( z ) X ( z ) Y ( z ) = H ( z ) X ( z ) = B ( z ) A ( z ) X ( z )
(69)

Steiglitz and McBride define the following problem,

min ε s = i E i ( z ) 2 = 1 2 π j X ( z ) B ( z ) A ( z ) - D ( z ) 2 d z z min ε s = i E i ( z ) 2 = 1 2 π j X ( z ) B ( z ) A ( z ) - D ( z ) 2 d z z
(70)

where X(z)=jxjz-jX(z)=jxjz-j and D(z)=jdjz-jD(z)=jdjz-j represent the z-transforms of the input and desired signals respectively. Equation Equation 70 is the familiar nonlinear solution error function expressed in the Z-domain. Steiglitz and McBride realized the complexity of such function and proposed the iterative solution Equation 70 using a simpler problem defined by

min ε e = i E i ( z ) 2 = 1 2 π j X ( z ) B ( z ) - D ( z ) A ( z ) 2 d z z min ε e = i E i ( z ) 2 = 1 2 π j X ( z ) B ( z ) - D ( z ) A ( z ) 2 d z z
(71)

This linearized error function is the familiar equation error in the Z-domain. Steiglitz and McBride proposed a two-mode iterative approach. The SMB Mode 1 iteration is similar to the SK method, in that at the kk-th iteration a linearized error criterion based on Equation 71 is used,

E k ( z ) = B k ( z ) A k - 1 ( z ) X ( z ) - A k ( z ) A k - 1 ( z ) D ( z ) = W k ( z ) B k ( z ) X ( z ) - A k ( z ) D ( z ) E k ( z ) = B k ( z ) A k - 1 ( z ) X ( z ) - A k ( z ) A k - 1 ( z ) D ( z ) = W k ( z ) B k ( z ) X ( z ) - A k ( z ) D ( z )
(72)

where

W k ( z ) = 1 A k - 1 ( z ) W k ( z ) = 1 A k - 1 ( z )
(73)

Their derivation4 leads to the familiar linear system

C x = y C x = y
(74)

with the following vector definitions

x = b 0 b N a 1 a N q j = x j x j - N + 1 d j - 1 d j - N x = b 0 b N a 1 a N q j = x j x j - N + 1 d j - 1 d j - N
(75)

The vector qjqj is referred to as the input-output vector. Then

C = j q j q j T y = j d j q j C = j q j q j T y = j d j q j
(76)

SMB Mode 2 is an attempt at reducing further the error once Mode 1 produces an estimate close enough to the actual solution. The idea behind Mode 2 is to consider the solution error defined by Equation 70 and equate its partial derivatives with respect to the coefficients to zero. Steiglitz and McBride showed [28], [18] that this could be attained by defining a new vector

r j = x j x j - N + 1 y j - 1 y j - N r j = x j x j - N + 1 y j - 1 y j - N
(77)

Then

C = j r j q j T y = j d j r j C = j r j q j T y = j d j r j
(78)

The main diference between Mode 1 and Mode 2 is the fact that Mode 1 uses the desired values to compute its vectors and matrices, whereas Mode 2 uses the actual output values from the filter. The rationale behind this is that at the beggining, the output function y(t)y(t) is not accurate, so the desired function provides better data for computations. On the other hand, Mode 1 does not really solve the desired problem. Once Mode 1 is deemed to have reached the vicinity of the solution, one can use true partial derivatives to compute the gradient and find the actual solution; this is what Mode 2 does.

It has been claimed that under certain conditions the Steiglitz-McBride algorithm converges. However no guarantee of global convergence exists. A more thorough discussion of the Steiglitz-McBride algorithm and its relationships to other parameter estimation algorithms (such as the Iterative Quadratic Maximum Likelihood algorithm, or IQML) are found in [9], [32], [23].

### Jackson's method

The following is a recent approach (from 2008) by Leland Jackson [14] based in the frequency domain. Consider vectors aRNaRN and bRMbRM such that

H ( ω ) = B ( ω ) A ( ω ) H ( ω ) = B ( ω ) A ( ω )
(79)

where H(ω),B(ω),A(ω)H(ω),B(ω),A(ω) are the Fourier transforms of h,bh,b and aa respectively. For a discrete frequency set one can describe Fourier transform vectors B=WbbB=Wbb and A=WaaA=Waa (where Wb,WaWb,Wa correspond to the discrete Fourier kernels for b,ab,a respectively). Define

H a ( ω k ) = 1 A ( ω k ) H a ( ω k ) = 1 A ( ω k )
(80)

In vector notation, let Da=diag(Ha)=diag(1/A)Da=diag(Ha)=diag(1/A). Then

H ( ω ) = B ( ω ) A ( ω ) = H a ( ω ) B ( ω ) H = D a B H ( ω ) = B ( ω ) A ( ω ) = H a ( ω ) B ( ω ) H = D a B
(81)

Let Hd(ω)Hd(ω) be the desired complex frequency response. Define Dd=diag(Hd)Dd=diag(Hd). Then one wants to solve

min E * E = E 2 2 min E * E = E 2 2
(82)

where E=H-HdE=H-Hd. From Equation 81 one can write H=Hd+EH=Hd+E as

H = D a B = D a W b b H = D a B = D a W b b
(83)

Therefore

H d = H - E = D a W b b - E H d = H - E = D a W b b - E
(84)

Solving Equation 84 for bb one gets

b = ( D a W b ) H d b = ( D a W b ) H d
(85)

Also,

H d = D d I ^ = D d D a A = D a D d A = D a D d W a a H d = D d I ^ = D d D a A = D a D d A = D a D d W a a
(86)

where I^I^ is a unit column vector. Therefore

H - E = H d = D a D d W a a H - E = H d = D a D d W a a
(87)

From Equation 84 we get

D a W b b - E = D a D d W a a D a W b b - E = D a D d W a a
(88)

or

D a D d W a a + E = D a W b b D a D d W a a + E = D a W b b
(89)

which in a least squares sense results in

a = ( D a D d W a ) ( D a W b b ) a = ( D a D d W a ) ( D a W b b )
(90)

From Equation 85 one gets

a = ( D a D d W a ) ( D a W b [ ( D a W b ) H d ] ) a = ( D a D d W a ) ( D a W b [ ( D a W b ) H d ] )
(91)

As a summary, at the ii-th iteration one can write Equation 84 and Equation 90 as follows,

b i = ( diag ( 1 / A i - 1 ) W b ) H d a i = ( diag ( 1 / A i - 1 ) diag ( H d ) W a ) ( diag ( 1 / A i - 1 ) W b b i ) b i = ( diag ( 1 / A i - 1 ) W b ) H d a i = ( diag ( 1 / A i - 1 ) diag ( H d ) W a ) ( diag ( 1 / A i - 1 ) W b b i )
(92)

## Soewito's quasilinearization method

Consider the equation error residual function

e ( ω k ) = B ( ω k ) - D ( ω k ) · A ( ω k ) = n = 0 M b n e - j ω k n - D ( ω k ) · 1 + n = 1 N a n e - j ω k n = b 0 + b 1 e - j ω k + + b M e - j ω k M - D k - D k a 1 e - j ω k - - D k a N e - j ω k N = b 0 + b M e - j ω k M - D k a 1 e - j ω k + a N e - j ω k N - D k e ( ω k ) = B ( ω k ) - D ( ω k ) · A ( ω k ) = n = 0 M b n e - j ω k n - D ( ω k ) · 1 + n = 1 N a n e - j ω k n = b 0 + b 1 e - j ω k + + b M e - j ω k M - D k - D k a 1 e - j ω k - - D k a N e - j ω k N = b 0 + b M e - j ω k M - D k a 1 e - j ω k + a N e - j ω k N - D k
(93)

with Dk=D(ωk)Dk=D(ωk). The last equation indicates that one can represent the equation error in matrix form as follows,

e = F h - D e = F h - D
(94)

where

F = 1 e - j ω 0 e - j ω 0 M - D 0 e - j ω 0 - D 0 e - j ω 0 N 1 e - j ω L - 1 e - j ω L - 1 M - D L - 1 e - j ω L - 1 - D L - 1 e - j ω L - 1 N F = 1 e - j ω 0 e - j ω 0 M - D 0 e - j ω 0 - D 0 e - j ω 0 N 1 e - j ω L - 1 e - j ω L - 1 M - D L - 1 e - j ω L - 1 - D L - 1 e - j ω L - 1 N
(95)

and

h = b 0 b 1 b M a 1 a N and D = D 0 D L - 1 h = b 0 b 1 b M a 1 a N and D = D 0 D L - 1
(96)

Consider now the solution error residual function

s ( ω k ) = H ( ω k ) - D ( ω k ) = B ( ω k ) A ( ω k ) - D ( ω k ) = 1 A ( ω k ) B ( ω k ) - D ( ω k ) · A ( ω k ) = W ( ω k ) e ( ω k ) s ( ω k ) = H ( ω k ) - D ( ω k ) = B ( ω k ) A ( ω k ) - D ( ω k ) = 1 A ( ω k ) B ( ω k ) - D ( ω k ) · A ( ω k ) = W ( ω k ) e ( ω k )
(97)

Therefore one can write the solution error in matrix form as follows

s = W ( F h - D ) s = W ( F h - D )
(98)

where WW is a diagonal matrix with 1A(ω)1A(ω) in its diagonal. From Equation 98 the least-squared solution error εs=s*sεs=s*s can be minimized by

h = ( F * W 2 F ) - 1 F * W 2 D h = ( F * W 2 F ) - 1 F * W 2 D
(99)

From Equation 99 an iteration5 could be defined as follows

h i + 1 = ( F * W i 2 F ) - 1 F * W i 2 D h i + 1 = ( F * W i 2 F ) - 1 F * W i 2 D
(100)

by setting the weights WW in Equation 98 equal to Ak(ω)Ak(ω), the Fourier transform of the current solution for aa.

A more formal approach to minimizing εsεs consists in using a gradient method (these approaches are often referred to as Newton-like methods). First one needs to compute the Jacobian matrix JJ of ss, where the pqpq-th term of JJ is given by Jpq=sphqJpq=sphq with ss as defined in Equation 98. Note that the pp-th element of ss is given by

s p = H p - D p = B p A p - D p s p = H p - D p = B p A p - D p
(101)

For simplicity one can consider these reduced form expressions for the independent components of hh,

s p b q = 1 A p b q n = 0 M b n e - j ω p n = W p e - j ω p q s p a q = B p a q 1 A p = - B p A p 2 a q 1 + n = 1 N a n e - j ω p n = - 1 A p · B p A p · e - j ω p q = - W p H p e - j ω p q s p b q = 1 A p b q n = 0 M b n e - j ω p n = W p e - j ω p q s p a q = B p a q 1 A p = - B p A p 2 a q 1 + n = 1 N a n e - j ω p n = - 1 A p · B p A p · e - j ω p q = - W p H p e - j ω p q
(102)

Therefore on can express the Jacobian JJ as follows,

J = W G J = W G
(103)

where

G = 1 e - j ω 0 e - j ω 0 M - H 0 e - j ω 0 - H 0 e - j ω 0 N 1 e - j ω L - 1 e - j ω L - 1 M - H L - 1 e - j ω L - 1 - H L - 1 e - j ω L - 1 N G = 1 e - j ω 0 e - j ω 0 M - H 0 e - j ω 0 - H 0 e - j ω 0 N 1 e - j ω L - 1 e - j ω L - 1 M - H L - 1 e - j ω L - 1 - H L - 1 e - j ω L - 1 N
(104)

Consider the solution error least-squares problem given by

min h f ( h ) = s T s min h f ( h ) = s T s
(105)

where ss is the solution error residual vector as defined in Equation 98 and depends on hh. It can be shown [7] that the gradient of the squared error f(h)f(h) (namely ff) is given by

f = J * s f = J * s
(106)

A necessary condition for a vector hh to be a local minimizer of f(h)f(h) is that the gradient ff be zero at such vector. With this in mind and combining Equation 98 and Equation 103 in Equation 106 one gets

f = G * W 2 ( F h - D ) = 0 f = G * W 2 ( F h - D ) = 0
(107)

Solving the system Equation 107 gives

h = ( G * W 2 F ) - 1 G * W 2 D h = ( G * W 2 F ) - 1 G * W 2 D
(108)

An iteration can be defined as follows6

h i + 1 = ( G i * W i 2 F ) - 1 G i * W i 2 D h i + 1 = ( G i * W i 2 F ) - 1 G i * W i 2 D
(109)

where matrices WW and GG reflect their dependency on current values of aa and bb.

Atmadji Soewito [30] expanded the method of quasilinearization of Bellman and Kalaba [1] to the design of IIR filters. To understand his method consider the first order of Taylor's expansion near Hi(z)Hi(z), given by

H i + 1 ( z ) = H i ( z ) + [ B i + 1 ( z ) - B i ( z ) ] A i ( z ) - [ A i + 1 ( z ) - A i ( z ) ] B i ( z ) A i 2 ( z ) = H i ( z ) + B i + 1 ( z ) - B i ( z ) A i ( z ) - B i ( z ) [ A i + 1 ( z ) - A i ( z ) ] A i 2 ( z ) H i + 1 ( z ) = H i ( z ) + [ B i + 1 ( z ) - B i ( z ) ] A i ( z ) - [ A i + 1 ( z ) - A i ( z ) ] B i ( z ) A i 2 ( z ) = H i ( z ) + B i + 1 ( z ) - B i ( z ) A i ( z ) - B i ( z ) [ A i + 1 ( z ) - A i ( z ) ] A i 2 ( z )
(110)

Using the last result in the solution error residual function s(ω)s(ω) and applying simplification leads to

s ( ω ) = B i + 1 ( ω ) A i ( ω ) - H i ( ω ) A i + 1 ( ω ) A i ( ω ) + B i ( ω ) A i ( ω ) - D ( ω ) = 1 A i ( ω ) B i + 1 ( ω ) - H i ( ω ) A i + 1 ( ω ) + B i ( ω ) - A i ( ω ) D ( ω ) s ( ω ) = B i + 1 ( ω ) A i ( ω ) - H i ( ω ) A i + 1 ( ω ) A i ( ω ) + B i ( ω ) A i ( ω ) - D ( ω ) = 1 A i ( ω ) B i + 1 ( ω ) - H i ( ω ) A i + 1 ( ω ) + B i ( ω ) - A i ( ω ) D ( ω )
(111)

Equation Equation 111 can be expressed (dropping the use of ωω for simplicity) as

s = W B i + 1 - H i ( A i + 1 - 1 ) - H i + B i - D ( A i - 1 ) - D s = W B i + 1 - H i ( A i + 1 - 1 ) - H i + B i - D ( A i - 1 ) - D
(112)

One can recognize the two terms in brackets as Ghi+1Ghi+1 and FhiFhi respectively. Therefore Equation 112 can be represented in matrix notation as follows,

s = W [ G h i + 1 - ( D + H i - F h i ) ] s = W [ G h i + 1 - ( D + H i - F h i ) ]
(113)

with H=[H0,H1,,HL-1]TH=[H0,H1,,HL-1]T. Therefore one can minimize sTssTs from Equation 113 with

h i + 1 = ( G i * W i 2 G i ) - 1 G i * W i 2 ( D + H i - F h i ) h i + 1 = ( G i * W i 2 G i ) - 1 G i * W i 2 ( D + H i - F h i )
(114)

since all the terms inside the parenthesis in Equation 114 are constant at the (i+1)(i+1)-th iteration. In a sense, Equation 114 is similar to Equation 109, where the desired function is updated from iteration to iteration as in Equation 114.

It is important to note that any of the three algorithms can be modified to solve a weighted


l2l2 IIR approximation using a weighting function W(ω)W(ω) by defining

V ( ω ) = W ( ω ) A ( ω ) V ( ω ) = W ( ω ) A ( ω )
(115)

Taking Equation 115 into account, the following is a summary of the three different updates discussed so far:

SMB Frequency Mode-1: h i + 1 = ( F * V i 2 F ) - 1 F * V i 2 D SMB Frequency Mode-2: h i + 1 = ( G i * V i 2 F ) - 1 G i * V i 2 D Soewito's quasilinearization: h i + 1 = ( G i * V i 2 G i ) - 1 G i * V i 2 ( D + H i - F h i ) SMB Frequency Mode-1: h i + 1 = ( F * V i 2 F ) - 1 F * V i 2 D SMB Frequency Mode-2: h i + 1 = ( G i * V i 2 F ) - 1 G i * V i 2 D Soewito's quasilinearization: h i + 1 = ( G i * V i 2 G i ) - 1 G i * V i 2 ( D + H i - F h i )
(116)

## Footnotes

1. For consistency with the rest of this document, notation has been modified from the author's original paper whenever deemed necessary.
2. For further details on the algebraic manipulations involved, the reader should refer to [19].
3. In practice one should not invert the matrices H1H1 and H^2H^2 but use a more robust and efficient algorithm. See [11] for details.
4. For more details the reader should refer to [28], [18].
5. Soewito refers to this expression as the Steiglitz-McBride Mode-1 in frequency domain.
6. Soewito refers to this expression as the Steiglitz-McBride Mode-2 in frequency domain. Compare to the Mode-1 expression and the use of GiGi instead of FF.

## References

1. Bellman, R. E. and Kalaba, R. E. (1965). Quasilinearization and Nonlinear Boundary-Value Problems. New York: American Elsevier.
2. Blaricum, Michael L. Van. (1978, May). A Review of Prony's Method techniques for Parameter Estimation. In Air Force Statistical Estimation Workshop. (pp. 125-135).
3. Barrondale, I. and Olesky, D. D. (1981). Exponential Approximation using Prony's Method. In The Numerical Solution of Nonlinear Problems. (pp. 258-269). New York: Oxford University Press.
4. (1976). Lecture notes in physics: Pade Approximates Method and its Applications to Mechanics. Springer-Verlag.
5. de Prony, Baron (Gaspard Clair Francois Marie Riche). (1795). Essai Experimental et Analytique: Sur les lois de la Dilatabilite des fluides elastiques et sur celles de la Force expansive de la vapeur de l'eau et de la vapeur de l'alkool, à differentes temperatures. Journal de l'Ecole Polytechnique (Paris), 1(2), 24-76.
6. Deczky, Andrew. (1972, October). Synthesis of Recursive Digital Filters Using the Minimum -error Criterior. IEEE Transactions on Audio and Electroacoustics, AU-20(4), 257-263.
7. Dennis, J. E. and Schnabel, R. B. (1996). Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Philadelphia, PA: SIAM.
8. Elaydi, Saber N. (1996). Undergraduate texts in mathematics: An Introduction to Difference Equations. New York: Springer.
9. Fan, H. and Doroslovački, Milovs. (1993, February). On "Global Convergence" of Steiglitz-McBride Adaptive Algorithm. IEEE Transactions on Circuits and Systems II, 40(2), 73-87.
10. Fletcher, R. and Powell, M. J. D. (1963). A Rapidly Convergent Descent Method for Minimization. Computer Journal, 6(2), 163-168.
11. Golub, Gene H. and Loan, Charles F. Van. (1996). Matrix Computations. Johns Hopkins University Press.
12. Gerald, Curtis F. and Wheatley, Patrick O. (1984). Applied Numerical Analysis. Reading, MA: Addison-Wesley.
13. Hildebrand, F. B. (1974). Introduction to Numerical Analysis. McGraw-Hill.
14. Jackson, Leland B. (2008). Frequency-Domain Steiglitz-McBride Method for Least-Squares IIR Filter Design, ARMA Modeling, and Periodogram Smoothing. IEEE Signal Processing Letters, 15, 49-52.
15. Jr., David F. Tuttle. (1971). On Fluids, Networks, and Engineering Education. In Aspects of Network. (pp. 591-612). Hol, Rinehart and Winston, Inc.
16. Kumaresan, R. and Burrus, C. S. (1991). Fitting a Pole-Zero Filter Model to Arbitrary Frequency Response Samples. Proc. ASILOMAR, 1649-1652.
17. Kumaresan, R. (1990, Novermber). Identification of Rational Transfer Functions from Frequency Response Samples. IEEE Transactions on Aerospace and Electronic Systems, 26(6), 925-934.
18. L. E.McBride, H. W. Schaefgen and Steiglitz, K. (1966, December). Time-Domain Approximation by Iterative Methods. IEEE Transactions on Circuit Theory, CT-13(4), 381-87.
19. Levy, E. C. (1959, May). Complex-Curve Fitting. IRE Transactions on Automatic Control, AC-4(1), 37-43.
20. Marple, Jr, S. L. (1987). Signal Processing Series: Digital Spectral Analysis with Applications. Englewood Cliffs, NJ: Prentice-Hall.
21. M. A. Sid-Ahmed, A. Chottera and Jullien, G. A. (1978, October). Computational Techniques for Least-Square Design of Recursive Digital Filters. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-26(5), 477-480.
22. Mickens, Ronald E. (1987). Difference equations. Van Nostrand Reinhold.
23. McClellan, J. H. and Lee, D. (1991, February). Exact Equivalence of the Steiglitz-McBride Iteration and IQML. IEEE Transactions on Signal Processing, 39(2), 509-12.
24. Oppenheim, A. V. and Schafer, R. W. (1989). Discrete-Time Signal Processing. Englewood Cliffs, N.J.: Prentice-Hall.
25. Pade, Henri Eugene. (1892). Sur la Représentation Approchee d'une Fonction par des Fractions Rationnelles. Annales Scientifiques de L'Ecole Normale Superieure (Paris), 9(3), 1-98.
26. Parks, T. W. and Burrus, C. S. (1987). Digital Filter Design. John Wiley and Sons.
27. Sanathanan, C. K. and Koerner, J. (1963, January). Transfer Function Synthesis as a Ratio of Two Complex Polynomials. IEEE Transactions on Automatic Control, AC-8, 56-58.
28. Steiglitz, K. and McBride, L. E. (1965, October). A Technique for the Identification of Linear Systems. IEEE Transactions on Automatic Control, AC-10, 461-64.
29. Spanos, J. T. and Mingori, D. L. (1993, January). Newton Algorithm for Fitting Transfer Functions to Frequency Measurements. Journal of Guidance, Control and Dynamics, 16(1), 34-39.
30. Soewito, Atmadji W. (1990, December). Least Square Digital Filter Design in the Frequency Domain. Ph. D. Thesis. Rice University.
31. Sorensen, D. C. (1982). Newton's Method with a Model Trust Region Modification. SIAM Journal of Numerical Analysis, 16, 409-426.
32. Stoica, P. and Söderström, T. (1981, June). The Steiglitz-McBride Identification Algorithm Revisited-Convergence Analysis and Accuracy Aspects. IEEE Transactions on Automatic Control, AC-26(3), 712-17.
33. Steiglitz, Kenneth. (1970, June). Computer-Aided Design of Recursive Digital Filters. IEEE Transactions on Audio and Electroacoustics, AU-18(2), 123-129.
34. T. P. Krauss, L. Shure et al. (1994). 2. Signal Processing Toolbox User's Guide. (pp. 143-145). The MathWorks.
35. Weiss, L. and McDonough, R. N. (1963, April). Prony's method, Z-transforms , and Pade approximation. SIAM Review, 5(2), 145-149.

## Content actions

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

#### Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

#### Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks