One of the more powerful tools in statistical communication
theory is the abstract concept of a linear vector
space. The key result that concerns us is the
representation theorem: a deterministic time
function can be uniquely represented by a sequence of numbers.
The stochastic version of this theorem states that a process can
be represented by a sequence of uncorrelated random variables.
These results will allow us to exploit the theory of hypothesis
testing to derive the optimum detection
strategy.
A linear vector space
S
S
is a collection of elements called vectors having the
following properties:
-
The vector-addition operation can be defined so that if
x∧y∧z∈S
x
y
z
S
:
-
x+y∈S
x
y
S
(the space is closed under addition)
-
x+y=y+x
x
y
y
x
(Commutivity)
-
x
+
y
+z=x+
y
+
z
x
+
y
z
x
y
+
z
(Associativity)
-
The zero vector exists and is always an element of
S
S.
The zero vector is defined by
x+0=x
x
0
x
.
-
For each
x∈S
x
S
,
a unique vector
-x
x
is also an element of
S
S
so that
x+-x=0
x
x
0
, the zero vector.
-
Associated with the set of vectors is a set of scalars
which constitute an algebraic field. A
field is a set of elements which obey the
well-known laws of associativity and commutivity for
both addition and multiplication. If
a
a,
b
b
are scalars, the elements
x
x,
y
y
of a linear vector space have the properties that:
-
ax
a
x
(multiplication by scalar
a
a) is defined and
ax∈S
a
x
S
.
-
a
b
x
=
a
b
x
a
b
x
a
b
x
.
-
If "1" and "0" denotes the multiplicative and
additive identity elements respectively of the
field of scalars; then
1x=x
1
x
x
and
0x=0
0
x
0
-
ax+y=ax+ay
a
x
y
a
x
a
y
and
a+bx=ax+bx
a
b
x
a
x
b
x
.
There are many examples of linear vector spaces. A familiar
example is the set of column vectors of length
N
N.
In this case, we define the sum of two vectors to be:
x
1
x
2
⋮
x
N
+
y
1
y
2
⋮
y
N
=
x
1
+
y
1
x
2
+
y
2
⋮
x
N
+
y
N
x
1
x
2
⋮
x
N
y
1
y
2
⋮
y
N
x
1
y
1
x
2
y
2
⋮
x
N
y
N
(1)
and scalar multiplication to be
a
x
1
x
2
…
x
N
T=a
x
1
a
x
2
…a
x
N
T
a
x
1
x
2
…
x
N
a
x
1
a
x
2
…
a
x
N
.
All of the properties listed above are satisfied.
A more interesting (and useful) example is the collection of
square integrable functions. A square-integrable
function
xt
x
t
satisfies:
∫
T
i
T
f
|xt|2dt<∞
t
T
i
T
f
x
t
2
(2)
One can verify that this collection constitutes a linear
vector space. In fact, this space is so important that it has
a special name -
L
2
T
i
T
f
L
2
T
i
T
f
(read this as
el-two); the arguments
denote the range of integration.
Let SS be a linear vector
space. A subspace 𝒯𝒯 of
SS is a subset of
SS which is closed. In other
words, if
x∧y∈𝒯
x
y
𝒯
, then
x∧y∈S
x
y
S
and all elements of 𝒯𝒯
are elements of SS, but some
elements of SS are not
elements of 𝒯𝒯.
Furthermore, the linear combination
ax+by∈𝒯
a
x
b
y
𝒯
for all scalars aa,
bb. A subspace is sometimes
referred to as a closed linear manifold.
A structure needs to be defined for linear vector spaces so
that definitions for the length of a vector and for the
distance between any two vectors can be obtained. The notions
of length and distance are closely related to the concept of
an inner product.
An inner product of two real vectors
x∧y∈S
x
y
S
, is denoted by
<x,y>
x
y
and is a scalar assigned to the
vectors
x
x
and
y
y
which satisfies the following properties:
-
<x,y>=<y,x>
x
y
y
x
-
<ax,y>=a<x,y>
a
x
y
a
x
y
,
a
a is a scalar
-
<x+y,z>=<x,z>+<y,z>
x
y
z
x
z
y
z
,
z
z a vector.
-
<x,x>>0
x
x
0
unless
x=0
x
0
.
In this case,
<x,x>=0
x
x
0
.
As an example, an inner product for the space consisting
of column matrices can be defined as
<x,y>=xTy=∑i=1N
x
i
y
i
x
y
x
y
i
1
N
x
i
y
i
The reader should verify that this is indeed a valid inner
product (i.e., it satisfies all of the properties given
above). It should be noted that this definition of an inner
product is not unique: there are other
inner product definitions which also satisfy all of these
properties. For example, another valid inner product is
<x,y>=xTKy
x
y
x
K
y
where KK is an
N
x
N
N
x
N
positive-definite matrix. Choices of the matrix KK which are not positive
definite do not yield valid inner products (property 4 is not satisfied). The
matrix KK is termed
the kernel of the inner product. When this
matrix is something other than an identity matrix, the inner
product is sometimes written as
x
,
y
K
x
,
y
K
to denote explicitly the presence of the kernel in the
inner product.
The norm of a vector
x∈S
x
S
is denoted by
∥x∥
x
and is defined by:
∥x∥=<x,x>1/2
x
x
x
12
(3)
Because of the properties of an inner product, the norm of a
vector is always greater than zero unless the vector is
identically zero. The norm of a vector is related to the
notion of the length of a vector. For
example, if the vector
x
x
is multiplied by the constant scalar
a
a,
the norm of the vector is also multiplied by
a
a.
∥ax∥=<ax,ax>1/2=a∥x∥
a
x
a
x
a
x
12
a
x
In other words, "longer" vectors
(
a>1
a
1
) have larger norms. A norm can also be defined when
the inner product contains a kernel. In this case, the norm
is written
∥x∥K
K
x
for clarity.
An inner product space is a linear vector
space in which an inner product can be defined for all
elements of the space and a norm is given by Equation 3. Note in particular
that every element of an inner product space must satisfy
the axioms of a valid inner product.
For the space
S
S
consisting of column matrices, the norm of a vector is given
by (consistent with the first choice of an inner product)
∥x∥=∑i=1N
x
i
21/2
x
i
1
N
x
i
2
12
This choice of a norm corresponds to the Cartesian definition
of the length of a vector.
One of the fundamental properties of inner product spaces is the
Schwarz inequality
|<x,y>|≤∥x∥∥y∥
x
y
x
y
(4)
This is one of the most important inequalities we shall
encounter. To demonstrate this inequality, consider the norm
squared of
x+ay
x
a
y
.
∥x+ay∥2=<x+ay,x+ay>=∥x∥2+2a<x,y>+a2∥y∥2
x
a
y
2
x
a
y
x
a
y
x
2
2
a
x
y
a
2
y
2
Let
a=-<x,y>∥y∥2
a
x
y
y
2
. In this case:
∥x+ay∥2=∥x∥2-2|<x,y>|2∥y∥2+|<x,y>|2∥y∥4∥y∥2=∥x∥2-|<x,y>|2∥y∥2
x
a
y
2
x
2
2
x
y
2
y
2
x
y
2
y
4
y
2
x
2
x
y
2
y
2
As the left hand side of this result is non-negative, the
right-hand side is lower-bounded by zero. The
Schwarz inequality is thus obtained.
Note that the equality occurs
only when
x=-ay
x
a
y
, or equivalently when
x=cy
x
c
y
, where
c
c
is any constant.
Two vectors are said to be orthogonal if the
inner product of the vectors is zero:
<x,y>=0
x
y
0
.
Consistent with these results is the concept of the
"angle" between two vectors. The cosine
of this angle is defined by:
cos
x
,
y
=<x,y>∥x∥∥y∥
x
,
y
x
y
x
y
Because of the Schwarz inequality,
|cos
x
,
y
|≤1
x
,
y
1
.
The angle between the orthogonal vectors is
±π2
±
2
and the angle between vectors satisfying the Schwarz inequality with equality
x∝y
∝
x
y
is zero (the vectors are parallel to each other).
The distance between two vectors is taken to
be the norm of the difference of the vectors.
dxy=∥x-y∥
d
x
y
x
y
In our example of the normed space of column matrices, the
distance between xx
and yy would be
∥x-y∥=∑i=1N
x
i
-
y
i
21/2
x
y
i
1
N
x
i
y
i
2
12
which agrees with the Cartesian notion of
distance. Because of the properties of the inner product, this
distance measure (or metric) has the following
properties:
-
dxy=dyx
d
x
y
d
y
x
(Distance does not depend on how it is measured.)
-
dxy=0⇒x=y
d
x
y
0
x
y
(Zero distance means equality)
-
dxz≤dxy+dyz
d
x
z
d
x
y
d
y
z
(Triangle inequality)
We use this distance measure to define what we mean by
convergence. When we say the sequence of vectors
x
n
x
n
converges to
xx
(
x
n
→x
x
n
x
), we mean
limn→∞∥
x
n
-x∥=0
n
n
x
n
x
0