This section presents a method for computing the optimal rotation
between 2 datasets as an orthonormal rotation matrix. As stated earlier,
this approach is slightly more numerically unstable (since guaranteeing
the orthonormality of a matrix is harder than the unit length of a
quaternion) and requires taking care of the special case when the resulting
matrix may not be a proper rotation, as discussed below.
As stated earlier, the optimal alignment requires both a translation and a rotation. The translational part of the alignment is easy to calculate. It can be proven that the optimal alignment is obtained by translating one set so that its centroid coincides with the other set's centroid (see section 2-C of [3] [link] for proof). The centroid of a point set a is simply the average position of all its points:
We can then redefine each point in two sets A and B as a deviation from the centroid:
Given this notation relative to the centroid, we can explicitly set the centroids to be equal and proceed with the rotational part of the alignment.
One of the first references to the solution of this problem in matrix form is from Kabsch [1][2]. The Kabsch method uses Lagrange multipliers to solve a minimization problem to find the optimal rotation. Here, we present a slightly more intuitive method based on matrix algebra and properties, that achieves the same result. This formulation can be found in [4] and [5]. Imagine we wish to align two conformations composed of N atoms each, whose Cartesian coordinates are given by the vectors xx and yy. The main idea behind this approach is to find a 3x3 orthonormal matrix UU such that the application of UU to the atom positions of one of the data vectors, xx, aligns it as best as possible with the other data vector, yy, in the sense that the quantity to minimize is the distance d(Ux,y)d(Ux,y), where xx and yy are assumed to be centered, that is, both their centroids coincide at the origin (centering both conformations is the first step). Mathematically, this problem can be stated as the minimization of the following quantity:

When E is a minimum, the square root of its value becomes the least RMSD (lRMSD) between xx and yy. Being an orthonormal rotation matrix, UU needs to satisfy the orthonormality property
U
U
T
=I
U
U
T
I
, where II is the identity matrix. The orthonormality contraint ensures that the rows and columns are mutually orthogonal, and that their length (as vectors) is one. Any orthonormal matrix represents a rigid orientation (transformation) in space. The only problem with this approach as is, is that all orthonormal matrices encode a rigid transformation, but if the rows/columns of the matrix do not constitute a right handed system, then the rotation is said to be improper. In an improper rotation, one of the three directions may be "mirrored". Fortunately, this case can be detected easily by computing the determinant of the matrix UU, and if it is negative, correcting the matrix. Denoting UxUx as x'x', and moving the constant factor N to the left, the formula for the error becomes:

An alternative way to represent the two point sets, rather than a one-dimensional vector or as separate atom coordinates, is using two 3xN matrices (N atoms, 3 coordinates for each). Using this scheme, xx is represented by the matrix XX and yy is represented by the matrix YY. Note that column 1≤i≤N1iN in these matrices stands for point (atom) xixi and yiyi, respectively. Using this new representation, we can write:

where X'=UXX'UX and Tr(A)Tr(A) stands for the trace of matrix A, the sum of its diagonal elements. It is easy to see that that the trace of the matrix to the right amounts precisely to the sum on the left (simply carrying out the multiplication of the first row/column should convince the reader). The right-hand side of the equation can be expanded into:

Which follows from the properties of the trace operator, namely: Tr(A+B)=Tr(A)+Tr(B), Tr(AB)=Tr(BA)Tr(A+B)=Tr(A)+Tr(B), Tr(AB)=Tr(BA), Tr(Tr(ATAT)=Tr(A))=Tr(A), and Tr(kA)=kTr(A)Tr(kA)=kTr(A). Furthermore, the first two terms in the expansion above represent the sum of the squares of the components xixi and yiyi, so it can be rewritten as:

Note that the xx components do not need to be primed (i.e., x'x') since the rotation UU around the origin does not change the length of xixi. Note that the summation above does not depend on UU, so minimizing E is equivalent to maximizing Tr(Tr(YTX'YTX')). For this reason, the rest of the discussion focuses on finding a proper rotation matrix UU that maximizes Tr(Tr(YTX'YTX')). Remembering that X'=UXX'UX, the quantity to maximize is then Tr(Tr((YTU)XYTUX)). From the property of the trace operator, this is equivalent to Tr(Tr((XYT)UXYTU)). Since XYTXYT is a square 3x3 matrix, it can be decomposed through the Singular Value Decomposition technique (SVD) into XYT=VSWTXYTVSWT, where VV and WTWT are the matrices of left and right eigenvectors (which are orthonormal matrices), respectively, and SS is a diagonal 3x3 matrix containing the eigenvalues s1s1, s2s2, s3s3 in decreasing order. Again from the properties of the trace operator, we obtain that:

If we introduce the 3x3 matrix TT as the product
T=WTUV
T
WT
UV
, we can rewrite the above expression as:

Since TT is the product of orthonormal matrices, it is itself an orthonormal matrix and det(T)=+/-1det(T)=+/-1. This means that the absolute value of each element of this matrix is no more than one, from where the last equality follows. It is obvious that the maximum value of the left hand side of the equation is reached when the diagonal elements of TT are equal to 1, and since it is an orthonormal matrix, all other elements must be zero. This results in T=ITI. Moreover, since
T=WTUV
T
WT
UV
, we can write that
WTUV=I
WT
UV
I
, and because WW and VV are orthonormal,
WWT=I
W
WT
I
and
VVT=I
V
VT
I
. Multiplying
WTUV
WT
UV
by WW to the left and VTVT to the right yields a solution for UU:

Where VV and WTWT are the matrices of left and right eigenvectors, respectively, of the covariance matrix
C=XYT
C
X
YT
. This formula ensures that UU is orthonormal (the reader should carry out the high-level matrix multiplication and verify this fact).
The only remaining detail to take care of is to make sure that UU is a proper rotation, as discussed before. It could indeed happen that det(U)=-1det(U)=-1 if its rows/columns do not make up a right-handed system. When this happens, we need to compromise between two goals: maximizing Tr(Tr(YTX'YTX')) and respecting the constraint that det(U)=+1det(U)=+1. Therefore, we need to settle for the second largest value of Tr(Tr(YTX'YTX')). It is easy to see what the second largest value is; since:

then the second largest value occurs when T11=T22=+1T11T22+1 and T33=-1T33-1. Now, we have that TT cannot be the identity matrix as before, but instead it has the lower-right corner set to -1. Now we finally have a unified way to represent the solution. If det(C)>0det(C)>0, TT is the identity; otherwise, it has a -1 as its last element. Finally, these facts can be expressed in a single formula for the optimal rotation UU by stating:

where d=sign(det(C))dsign(det(C)). In the light of the preceding derivation, all the facts that have been presented as a proof can be succinctly put as an algorithm for computing the optimal rotation to align two data sets xx and yy:
- Build the 3xN matrices XX and YY containing, for the sets xx and yy respectively, the coordinates for each of the N atoms after centering the atoms by subtracting the centroids.
- Compute the covariance matrix C=XYTCXYT
- Compute the SVD (Singular Value Decomposition) of C=VSWTCVSWT
- Compute d=sign(det(C))dsign(det(C))
- Compute the optimal rotation UU as
