Several techniques have been proposed for solving a problem known as manifold learning in which certain properties of a manifold are inferred from a discrete collection of points sampled from that manifold. A typical manifold learning setup is as follows: an algorithm is presented with a set of PP points sampled from a KK-dimensional
submanifold of RNRN. The goal of the algorithm is to produce an mapping of these PP points into some lower dimension
RMRM (ideally, M=KM=K) while preserving some
characteristic property of the manifold.
Example algorithms include
ISOMAP [15], Hessian Eigenmaps
(HLLE) [9], and Maximum Variance Unfolding
(MVU) [16], which attempt to learn isometric embeddings of
the manifold (thus preserving pairwise geodesic distances in RMRM); Locally
Linear Embedding (LLE) [14], which attempts to preserve
local linear neighborhood structures among the embedded points;
Local Tangent Space Alignment (LTSA) [17], which attempts
to preserve local coordinates in each tangent space; and a method
for charting a manifold [5] that attempts to preserve
local neighborhood structures.
The internal mechanics of these algorithms differs depending on the objective criterion to be preserved, but as an example, the ISOMAP algorithm operates by first estimating the geodesic distance between each pair of points on the manifold (by approximating geodesic distance as the sum of Euclidean distances between pairs of the available sample points). After the P×PP×P matrix of pairwise geodesic distances is constructed, a technique known as multidimensional scaling uses an eigendecomposition of the distance matrix to determine the proper MM-dimensional embedding space.
An example of using ISOMAP to learn a 22-dimensional manifold is shown in Figure 1.
These algorithms can be useful for
learning the dimension and parametrizations of manifolds, for
sorting data, for visualization and navigation through the data,
and as preprocessing to make further analysis more tractable;
common demonstrations include analysis of face images and
classification of and handwritten digits. A related technique, the
Whitney Reduction Network [2], [3],
seeks a linear mapping to RMRM that preserves ambient
pairwise distances on the manifold and is particularly useful for
processing the output of dynamical systems having low-dimensional
attractors.
Other algorithms have been proposed for characterizing manifolds
from sampled data without constructing an explicit embedding in
RMRM. The Geodesic Minimal Spanning Tree
(GMST) [6] models the data as random samples from the
manifold and estimates the corresponding entropy and
dimensionality. Another technique [13] has been proposed
for using random samples of a manifold to estimate its homology
(via the Betti numbers, which essentially characterize its
dimension, number of connected components, etc.). Persistence
Barcodes [8] are a related technique that
involves constructing a type of signature for a manifold (or
simply a shape) that uses tangent complexes to detect and
characterize local edges and corners.
Additional algorithms have been proposed for constructing
meaningful functions on the point samples in RNRN. To
solve a semi-supervised learning problem, a method called
Laplacian Eigenmaps [4] has been proposed that involves
forming an adjacency graph for the data in RNRN, computing
eigenfunctions of the Laplacian operator on the graph (which form
a basis for L2L2 on the graph), and using these functions to
train a classifier on the data. The resulting classifiers have
been used for handwritten digit recognition, document
classification, and phoneme classification. (The MM smoothest
eigenfunctions can also be used to embed the manifold in MM,
similar to the approaches described above.) A related method
called Diffusion Wavelets [7] uses powers of the
diffusion operator to model scale on the manifold, then constructs
wavelets to capture local behavior at each scale. The result is a
wavelet transform adapted not to geodesic distance but to
diffusion distance, which measures (roughly) the number of paths
connecting two points.
"A resource on sparse, compressible, and manifold signal models for signals processing and compressed sensing."