Suppose that you are given a data set
D=
x
1
y
1
…
x
m
y
m
D
x
1
y
1
…
x
m
y
m
in
R2
2
and that you must find a smooth
curve to these points. There are many solutions to such a
problem. Most simply, you could "eyeball it" and draw in a
curve that looks good; however, this
solution is too subjective, not very precise. You could just
connect the dots, but this will probably not give a smooth line.
You could fit a polynomial, but this fitted polynomial may
wiggle too much. Or you could use splines.
Splines are piecewise polynomials with pieces
smoothly connected together. The joining points of the
polynomial pieces are called knots. Knots do not
have to be evenly spaced. When each segment of a spline is a
polynomial of degree nn, we say
that the spline is a spline of degree n.
We need to add some constraints to ensure smoothness. For a
spline of degree nn, we require
that the spline has continuous derivatives up to order
n−1
n
1
at each of the knots, i.e. a spline of degree
nn is in
Cn−1
C
n
1
.
Remember our problem of fitting a curve to the data? There are
generally two ways of approaching this problem: interpolation
and smoothing.
We can fit a spline to interpolate the data; i.e.
∀i,i∈1…m:f
x
i
=
y
i
i
i
1
…
m
f
x
i
y
i
We can fit a smoothing spline: i.e. find
ff to minimize
1n∑i=1n
y
i
−f
x
i
2+λ∫
x
1
x
m
fmxu2du
1
n
i
1
n
y
i
f
x
i
2
λ
u
x
1
x
m
f
x
m
u
2
The first term measures the closeness of the fitted function to
the data, while the second penalizes the curvature in the
function. λλ established
the trade off between the two. For
0<λ<∞
0
λ
, this constraint is minimized by a natural spline of
degree
m+1
m
1
(
Schoenberg). If
λ=0
λ
0
, then ff can be any
function which interpolates the data. If
λ=∞
λ
and
m+2
m
2
, then this is the simple least squares
line fit, since no second derivative can be tolerated.