# Connexions

You are here: Home » Content » Implementing Kronecker Products Efficiently

## Navigation

### Lenses

What is a lens?

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

#### Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
• Rice Digital Scholarship

This module is included in aLens by: Digital Scholarship at Rice UniversityAs a part of collection: "Automatic Generation of Prime Length FFT Programs"

Click the "Rice Digital Scholarship" link to see all content affiliated with them.

• Featured Content

This module is included inLens: Connexions Featured Content
By: ConnexionsAs a part of collection: "Automatic Generation of Prime Length FFT Programs"

Click the "Featured Content" link to see all content affiliated with them.

Click the tag icon to display tags associated with this content.

#### Also in these lenses

• UniqU content

This module is included inLens: UniqU's lens
By: UniqU, LLCAs a part of collection: "Automatic Generation of Prime Length FFT Programs"

Click the "UniqU content" link to see all content selected in this lens.

• Lens for Engineering

This module is included inLens: Lens for Engineering
By: Sidney Burrus

Click the "Lens for Engineering" link to see all content selected in this lens.

### Recently Viewed

This feature requires Javascript to be enabled.

### Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.

# Implementing Kronecker Products Efficiently

Module by: Ivan Selesnick, C. Sidney Burrus. E-mail the authors

Summary: This collection of modules is from a Rice University, ECE Department Technical Report written around September 1994. It grew out of the doctoral and post doctoral research of Ivan Selesnick working with Prof. C. Sidney Burrus at Rice. Earlier reports on this work were published in the ICASSP and ISCAS conference proceedings in 1992-94 and a fairly complete report was published in the IEEE Transaction on Signal Processing in January 1996.

## Implementing Kronecker Products Efficiently

In the algorithm described above we encountered expressions of the form A1A2AnA1A2An which we denote by i=1nAi.i=1nAi. To calculate the product iAixiAix it is computationally advantageous to factor iAiiAi into terms of the form IAiIIAiI[1]. Then each term represents a set of copies of AiAi. First, recall the following property of Kronecker products

A B C D = ( A C ) ( B D ) . A B C D = ( A C ) ( B D ) .
(1)

This property can be used to factor iAiiAi in the following way. Let the number of rows and columns of AiAi be denoted by riri and cici respectively. Then

A 1 A 2 = A 1 I c 1 I r 2 A 2 = ( A 1 I r 2 ) ( I c 1 A 2 ) . A 1 A 2 = A 1 I c 1 I r 2 A 2 = ( A 1 I r 2 ) ( I c 1 A 2 ) .
(2)

But we can also write

A 1 A 2 = I r 1 A 1 A 2 I c 2 = ( I r 1 A 2 ) ( A 1 I c 2 ) . A 1 A 2 = I r 1 A 1 A 2 I c 2 = ( I r 1 A 2 ) ( A 1 I c 2 ) .
(3)

Note that in factorization Equation 2, copies of A2A2 are applied to the data vector xx first, followed by copies of A1A1. On the other hand, in factorization Equation 3, copies of A1A1 are applied to the data vector xx first, followed by copies of A2A2. These two factorizations can be distinguished by the sequence in which A1A1 and A2A2 are ordered.

Lets compare the computational complexity of factorizations Equation 2 and Equation 3. Notice that Equation 2 consists of r2r2 copies of A1A1 and c1c1 copies of A2A2, therefore Equation 2 has a computational cost of r2Q1+c1Q2r2Q1+c1Q2 where QiQi is the computational cost of AiAi. On the other hand, the computational cost of Equation 3 is c2Q1+r1Q2c2Q1+r1Q2. That is, the factorizations Equation 2 and Equation 3 have in general different computational costs when AiAi are not square. Further, observe that Equation 2 is the more efficient factorization exactly when

r 2 Q 1 + c 1 Q 2 < c 2 Q 1 + r 1 Q 2 r 2 Q 1 + c 1 Q 2 < c 2 Q 1 + r 1 Q 2
(4)

or equivalently, when

r 1 - c 1 Q 1 > r 2 - c 2 Q 2 . r 1 - c 1 Q 1 > r 2 - c 2 Q 2 .
(5)

Consequently, in the more efficient factorization, the operation AiAi applied to the data vector xx first is the one for which the ratio (ri-ci)/Qi(ri-ci)/Qi is the more negative. If r1>c1r1>c1 and r2<c2r2<c2 then Equation 4 is always true (QiQi is always positive). Therefore, in the most computationally efficient factorization of A1A2A1A2, matrices with fewer rows than columns are always applied to the data vector xx before matrices with more rows than columns. If both matrices are square, then their ordering does not affect the computational efficiency, because in that case each ordering has the same computation cost.

We now consider the Kronecker product of more than two matrices. For the Kronecker product i=1nAii=1nAi there are n!n! possible different ways in which to order the operations AiAi. For example

A 1 A 2 A 3 = ( A 1 I r 2 r 3 ) ( I c 1 A 2 I r 3 ) ( I c 1 c 2 A 3 ) = ( A 1 I r 2 r 3 ) ( I c 1 r 2 A 3 ) ( I c 1 A 2 I c 3 ) = ( I r 1 A 2 I r 3 ) ( A 1 I c 2 r 3 ) ( I c 1 c 2 A 3 ) = ( I r 1 A 2 I r 3 ) ( I r 1 c 2 A 3 ) ( A 1 I c 2 c 3 ) = ( I r 1 r 2 A 3 ) ( A 1 I r 2 c 3 ) ( I c 1 A 2 I c 3 ) = ( I r 1 r 2 A 3 ) ( I r 1 A 2 I c 3 ) ( A 1 I c 2 c 3 ) A 1 A 2 A 3 = ( A 1 I r 2 r 3 ) ( I c 1 A 2 I r 3 ) ( I c 1 c 2 A 3 ) = ( A 1 I r 2 r 3 ) ( I c 1 r 2 A 3 ) ( I c 1 A 2 I c 3 ) = ( I r 1 A 2 I r 3 ) ( A 1 I c 2 r 3 ) ( I c 1 c 2 A 3 ) = ( I r 1 A 2 I r 3 ) ( I r 1 c 2 A 3 ) ( A 1 I c 2 c 3 ) = ( I r 1 r 2 A 3 ) ( A 1 I r 2 c 3 ) ( I c 1 A 2 I c 3 ) = ( I r 1 r 2 A 3 ) ( I r 1 A 2 I c 3 ) ( A 1 I c 2 c 3 )
(6)

Each factorization of iAiiAi can be described by a permutation g(·)g(·) of {1,...,n}{1,...,n} which gives the order in which AiAi is applied to the data vector xx. Ag(1)Ag(1) is the first operation applied to the data vector xx, Ag(2)Ag(2) is the second, and so on. For example, the factorization Equation 6 is described by the permutation g(1)=3g(1)=3, g(2)=1g(2)=1, g(3)=2g(3)=2. For n=3n=3, the computational cost of each factorization can be written as

C ( g ) = Q g ( 1 ) c g ( 2 ) c g ( 3 ) + r g ( 1 ) Q g ( 2 ) c g ( 3 ) + r g ( 1 ) r g ( 2 ) Q g ( 3 ) C ( g ) = Q g ( 1 ) c g ( 2 ) c g ( 3 ) + r g ( 1 ) Q g ( 2 ) c g ( 3 ) + r g ( 1 ) r g ( 2 ) Q g ( 3 )
(7)

In general

C ( g ) = i = 1 n j = 1 i - 1 r g ( j ) Q g ( i ) j = i + 1 n c g ( j ) . C ( g ) = i = 1 n j = 1 i - 1 r g ( j ) Q g ( i ) j = i + 1 n c g ( j ) .
(8)

Therefore, the most efficient factorization of iAiiAi is described by the permutation g(·)g(·) that minimizes CC.

It turns out that for the Kronecker product of more than two matrices, the ordering of operations that describes the most efficient factorization of iAiiAi also depends only on the ratios (ri-ci)/Qi(ri-ci)/Qi. To show that this is the case, suppose u(·)u(·) is the permutation that minimizes CC, then u(·)u(·) has the property that

r u ( k ) - c u ( k ) Q u ( k ) r u ( k + 1 ) - c u ( k + 1 ) Q u ( k + 1 ) r u ( k ) - c u ( k ) Q u ( k ) r u ( k + 1 ) - c u ( k + 1 ) Q u ( k + 1 )
(9)

for k=1,,n-1k=1,,n-1. To support this, note that since u(·)u(·) is the permutation that minimizes CC, we have in particular

C ( u ) C ( v ) C ( u ) C ( v )
(10)

where v(·)v(·) is the permutation defined by the following:

v ( i ) = { u ( i ) i < k , i > k + 1 u ( k + 1 ) i = k u ( k ) i = k + 1 . v ( i ) = { u ( i ) i < k , i > k + 1 u ( k + 1 ) i = k u ( k ) i = k + 1 .
(11)

Because only two terms in Equation 8 are different, we have from Equation 10

i = k k + 1 j = 1 i - 1 r u ( j ) Q u ( i ) j = i + 1 n c u ( j ) i = k k + 1 j = 1 i - 1 r v ( j ) Q v ( i ) j = i + 1 n c v ( j ) i = k k + 1 j = 1 i - 1 r u ( j ) Q u ( i ) j = i + 1 n c u ( j ) i = k k + 1 j = 1 i - 1 r v ( j ) Q v ( i ) j = i + 1 n c v ( j )
(12)

which, after canceling common terms from each side, gives

Q u ( k ) c u ( k + 1 ) + r u ( k ) Q u ( k + 1 ) Q v ( k ) c v ( k + 1 ) + r v ( k ) Q v ( k + 1 ) . Q u ( k ) c u ( k + 1 ) + r u ( k ) Q u ( k + 1 ) Q v ( k ) c v ( k + 1 ) + r v ( k ) Q v ( k + 1 ) .
(13)

Since v(k)=u(k+1)v(k)=u(k+1) and v(k+1)=u(k)v(k+1)=u(k) this becomes

Q u ( k ) c u ( k + 1 ) + r u ( k ) Q u ( k + 1 ) Q u ( k + 1 ) c u ( k ) + r u ( k + 1 ) Q u ( k ) Q u ( k ) c u ( k + 1 ) + r u ( k ) Q u ( k + 1 ) Q u ( k + 1 ) c u ( k ) + r u ( k + 1 ) Q u ( k )
(14)

which is equivalent to Equation 9. Therefore, to find the best factorization of iAiiAi it is necessary only to compute the ratios (ri-ci)/Qi(ri-ci)/Qi and to order them in an non-decreasing order. The operation AiAi whose index appears first in this list is applied to the data vector xx first, and so on

As above, if ru(k+1)>cu(k+1)ru(k+1)>cu(k+1) and ru(k)<cu(k)ru(k)<cu(k) then Equation 14 is always true. Therefore, in the most computationally efficient factorization of iAiiAi, all matrices with fewer rows than columns are always applied to the data vector xx before any matrices with more rows than columns. If some matrices are square, then their ordering does not affect the computational efficiency as long as they are applied after all matrices with fewer rows than columns and before all matrices with more rows than columns.

Once the permutation g(·)g(·) that minimizes CC is determined by ordering the ratios (ri-ci)/Qi(ri-ci)/Qi, iAiiAi can be written as

i = 1 n A i = i = n 1 I a ( i ) A g ( i ) I b ( i ) i = 1 n A i = i = n 1 I a ( i ) A g ( i ) I b ( i )
(15)

where

a ( i ) = k = 1 g ( i ) - 1 γ ( i , k ) a ( i ) = k = 1 g ( i ) - 1 γ ( i , k )
(16)
b ( i ) = k = g ( i ) + 1 n γ ( i , k ) b ( i ) = k = g ( i ) + 1 n γ ( i , k )
(17)

and where γ(·)γ(·) is defined by

γ ( i , k ) = { r k if g ( i ) > g ( k ) c k if g ( i ) < g ( k ) . γ ( i , k ) = { r k if g ( i ) > g ( k ) c k if g ( i ) < g ( k ) .
(18)

### Some Matlab Code

A Matlab program that computes the permutation that describes the computationally most efficient factorization of i=1nAii=1nAi is cgc() . It also gives the resulting computational cost. It requires the computational cost of each of the matrices AiAi and the number of rows and columns of each.

function [g,C] = cgc(Q,r,c,n)
% [g,C] = cgc(Q,r,c,n);
% Compute g and C
% g : permutation that minimizes C
% C : computational cost of Kronecker product of A(1),...,A(n)
% Q : computation cost of A(i)
% r : rows of A(i)
% c : columns of A(i)
% n : number of terms
f = find(Q==0);
Q(f) = eps * ones(size(Q(f)));
Q = Q(:);
r = r(:);
c = c(:);
[s,g] = sort((r-c)./Q);
C = 0;
for i = 1:n
C = C + prod(r(g(1:i-1)))*Q(g(i))*prod(c(g(i+1:n)));
end
C = round(C);


The Matlab program kpi() implements the Kronecker product i=1nAii=1nAi.

function y = kpi(d,g,r,c,n,x)
% y = kpi(d,g,r,c,n,x);
% Kronecker Product : A(d(1)) kron ... kron A(d(n))
% g : permutation of 1,...,n
% r : [r(1),...,r(n)]
% c : [c(1),..,c(n)]
% r(i) : rows of A(d(i))
% c(i) : columns of A(d(i))
% n : number of terms
for i = 1:n
a = 1;
for k = 1:(g(i)-1)
if i > find(g==k)
a = a * r(k);
else
a = a * c(k);
end
end
b = 1;
for k = (g(i)+1):n
if i > find(g==k)
b = b * r(k);
else
b = b * c(k);
end
end
% y = (I(a) kron A(d(g(i))) kron I(b)) * x;
y = IAI(d(g(i)),a,b,x);
end


where the last line of code calls a function that implements (IaAd(g(i))Ib)x(IaAd(g(i))Ib)x. That is, the program IAI(i,a,b,x) implements (IaA(i)Ib)x(IaA(i)Ib)x.

The Matlab program IAI implements y=(ImAIn)xy=(ImAIn)x

function y = IAI(A,r,c,m,n,x)
% y = (I(m) kron A kron I(n))x
% r : number of rows of A
% c : number of columns of A
v = 0:n:n*(r-1);
u = 0:n:n*(c-1);
for i = 0:m-1
for j = 0:n-1
y(v+i*r*n+j+1) = A * x(u+i*c*n+j+1);
end
end


It simply uses two loops to implement the mnmn copies of AA. Each copy of AA is applied to a different subset of the elements of xx.

### Vector/Parallel Interpretation

The command IAIIAI where is the Kronecker (or Tensor) product can be interpreted as a vector/parallel command [2], [3]. In these references, the implementation of these commands is discussed in detail and they have found that the Tensor product is “an extremely useful tool for matching algorithms to computer architectures [2].”

The expression IAIA can easily be seen to represent a parallel command:

I A = A A A . I A = A A A .
(19)

Each block along the diagonal acts on non-overlapping sections of the data vector - so that each section can be performed in parallel. Since each section represents exactly the same operation, this form is amenable to implementation on a computer with a parallel architectural configuration. The expression AIAI can be similarly seen to represent a vector command, see [2].

It should also be noted that by employing `stride' permutations, the command (IAI)x(IAI)x can be replaced by either (IA)x(IA)x or (AI)x(AI)x[2], [3]. It is only necessary to permute the input and output. It is also the case that these stride permutations are natural loading and storing commands for some architectures.

In the programs we have written in conjunction with this paper we implement the commands y=(IAI)xy=(IAI)x with loops in a set of subroutines. The circular convolution and prime length FFT programs we present, however, explicitly use the form IAIIAI to make clear the structure of the algorithm, to make them more modular and simpler, and to make them amenable to implementation on special architectures. In fact, in [2] it is suggested that it might be practical to develop tensor product compilers. The FFT programs we have generated will be well suited for such compilers.

## References

1. Agarwal, R. C. and Cooley, J. W. (1977, October). New Algorithms for Digital Convolution. IEEE Trans. Acoust., Speech, Signal Proc., 25(5), 392-410.
2. Granata, J. and Conner, M. and Tolimieri, R. (1992, January). The Tensor Product: A Mathematical Programming Language for FFTs and other Fast DSP Operations. IEEE Signal Processing Magazine, 9(1), 40-48.
3. Tolimieri, R. and An, M. and Lu, C. (1989). Algorithms for Discrete Fourier Transform and Convolution. Springer-Verlag.

## Content actions

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

### Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

### Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks