In the algorithm described above
we encountered expressions
of the form

This property can be used to factor

But we can also write

Note that in factorization Equation 2, copies of

Lets compare the computational complexity of
factorizations
Equation 2 and Equation 3.
Notice that Equation 2 consists of *different*
computational costs
when

or equivalently, when

Consequently,
in the more efficient factorization,
the operation

We now consider the Kronecker product of more than two matrices.
For the Kronecker product

Each factorization of

In general

Therefore, the most efficient factorization of

It turns out that for the Kronecker product of more than two
matrices, the ordering of operations that describes
the most efficient factorization of

for

where

Because only two terms in Equation 8 are different, we have from Equation 10

which, after canceling common terms from each side, gives

Since

which is equivalent to Equation 9.
Therefore, to find the best factorization of

As above, if *after* all matrices
with fewer rows than columns and *before*
all matrices with more rows than columns.

Once the permutation

where

and where

**Some Matlab Code**

A Matlab program that computes the permutation
that describes the computationally most efficient
factorization of `cgc()`

.
It also gives the resulting computational cost.
It requires the computational cost of each of the
matrices

```
function [g,C] = cgc(Q,r,c,n)
% [g,C] = cgc(Q,r,c,n);
% Compute g and C
% g : permutation that minimizes C
% C : computational cost of Kronecker product of A(1),...,A(n)
% Q : computation cost of A(i)
% r : rows of A(i)
% c : columns of A(i)
% n : number of terms
f = find(Q==0);
Q(f) = eps * ones(size(Q(f)));
Q = Q(:);
r = r(:);
c = c(:);
[s,g] = sort((r-c)./Q);
C = 0;
for i = 1:n
C = C + prod(r(g(1:i-1)))*Q(g(i))*prod(c(g(i+1:n)));
end
C = round(C);
```

The Matlab program `kpi()`

implements the Kronecker
product

```
function y = kpi(d,g,r,c,n,x)
% y = kpi(d,g,r,c,n,x);
% Kronecker Product : A(d(1)) kron ... kron A(d(n))
% g : permutation of 1,...,n
% r : [r(1),...,r(n)]
% c : [c(1),..,c(n)]
% r(i) : rows of A(d(i))
% c(i) : columns of A(d(i))
% n : number of terms
for i = 1:n
a = 1;
for k = 1:(g(i)-1)
if i > find(g==k)
a = a * r(k);
else
a = a * c(k);
end
end
b = 1;
for k = (g(i)+1):n
if i > find(g==k)
b = b * r(k);
else
b = b * c(k);
end
end
% y = (I(a) kron A(d(g(i))) kron I(b)) * x;
y = IAI(d(g(i)),a,b,x);
end
```

where the last line of code
calls a function that implements `IAI(i,a,b,x)`

implements

The Matlab program `IAI`

implements

```
function y = IAI(A,r,c,m,n,x)
% y = (I(m) kron A kron I(n))x
% r : number of rows of A
% c : number of columns of A
v = 0:n:n*(r-1);
u = 0:n:n*(c-1);
for i = 0:m-1
for j = 0:n-1
y(v+i*r*n+j+1) = A * x(u+i*c*n+j+1);
end
end
```

It simply uses two loops to implement the

**Vector/Parallel Interpretation**

The command

The expression

Each block along the diagonal acts on non-overlapping
sections of the data vector - so that each section
can be performed in parallel. Since each section
represents exactly the same operation, this form is
amenable to implementation on a computer with a
parallel architectural configuration.
The expression

It should also be noted that by employing `stride' permutations,
the command

In the programs we have written in conjunction with this paper
we implement the commands