Protein Inverse Kinematics and the Loop Closure Problem1.122003/09/23 19:49:35 GMT-52007/06/11 03:14:50.838 GMT-5LydiaE.Kavrakikavraki@rice.eduLydiaE.Kavrakikavraki@rice.eduAmardaShehushehua@cs.rice.eduDavidSchwarzdschwarz@rice.eduHernanFStamatihstamati@cs.rice.eduCCDcyclic coordinate descentinverse kinematicsloop closuremanipulatorproteinThis module introduces students to inverse kinematics, which is the problem of finding values of the degrees of freedom of a manipulator chain so that the chain satisfies given spatial constraints. An application of inverse kinematics to solve the loop closure problem in structural biology is also presented. Background Material Inverse Kinematics and its Relevance to Proteins Solving Inverse Kinematics Inverse Kinematics Methods Classical Methods Optimization-Based Methods Cyclic Coordinate Descent and Its Application to Proteins Background Material
The math involved in solving the
Inverse Kinematics problem requires some background in linear
algebra, specifically in the anatomy and application of
transformation matrices. Please refer to Forward Kinematics for an
introduction to transformation matrices. It is very important that
you understand how to apply transformations for the Forward
Kinematics of a chain. Inverse Kinematics and its Relevance to ProteinsInverse kinematics (IK) is the
problem of finding the right values for the underlying degrees of
freedom of a chain, in the case of a protein polypeptide chain, of
the dihedral angles, so that the chain satisfies certain spatial
constraints. For example, in some applications, it is necessary to
find rotations that can steer certain atoms to desired locations in
space. To achieve a particular function, protein regions sometimes
have to undergo concerted motion where atoms move together in order
to locate themselves near another protein or molecule. The motion of
atoms is spatially constrained because they have to assume specific
target locations in space. However, since atoms must move together
in order not to break bonds by their motion, it is easier to model
their motion in dihedral angle space, where bond
lengths and bond angles are fixed. This parameterization of protein
motion, called the idealized or rigid geometry
model, is discussed in Representing
Proteins in silico: Data Structures and
Kinematics.Solving the Inverse Kinematics problem in the context of
proteins, i.e., finding what values of the dihedral angles of a
protein polypeptide chain yield configurations of the chain where
the endpoints satisfy spatial constraints, is a very important
problem in structural biology. The relevance of Inverse Kinematics
for proteins can be seen in three main applications:
Finding a missing loop (Loop Closure Problem) Characterizing the Flexibility of a fragment of the protein
polypeptide chain Generating ensembles of protein structures
It is worth noting that many globular proteins have a relatively
stable, inflexible core region consisting of tightly arranged
secondary structure elements. However, proteins are less compact and
more flexible at the surface, where unstructured fragments of the
protein polypeptide chain, mobile loops, may swing freely. One
consequence of loop mobility is that experimental structure
determination methods may have difficulty resolving the atomic
positions of surface loops. The positions of the atoms in mobile
loops may be so inconsistent that no single position relative to the
core dominates. In such cases, experimental structure determination
methods cannot determined the positions of the atoms of a mobile
loop. When this happens, the result is a partially
resolved protein structure, with fragments of the protein chain,
such as mobile loops, missing. The only information available for
the missing fragment is its amino acid sequence and where its two
endpoints need to be spatially located in order to connect with the
known, resolved, part of the protein structure. Given the spatial
constraints on the endpoints of the missing fragment, one needs to
find values to the dihedral angles of the fragment in order to
obtain configurations of the fragment consistent with the
constraints. This problem is known as the Loop Closure problem in
the structural biology community. It is easy to note that even
though this problem is cast in the context of finding atomic
positions of a missing fragment such as a mobile loop, it is nothing
new but a statement of the Inverse Kinematics problem for proteins.
Solving the Inverse Kinematics problem in
the context of a missing fragment in proteins is not limited to
finding mobile loops. More generally, through the Inverse Kinematics
problem, one can search for alternative configurations of any
fragment of a protein polypeptide chain (even fragments containing
secondary structural elements) that satisfy the spatial constraints
on their endpoints. Very recently, a third application has emerged,
where alternative configurations of consecutive fragments that cover
a polypeptide chain are generated to obtain an ensemble of
alternative protein structures. Solving Inverse Kinematics
In applying inverse kinematics algorithms to proteins, we are taking
advantage of a striking similarity between organic molecules and
robotic manipulators (robot arms) in terms of how they move. As
robot manipulators have joints, proteins have atoms. As robot
manipulators have links that connect their joints, proteins have
bonds that connect their atoms. The similarity between proteins and
robots makes it possible for us to apply to proteins a large
existing literature of solutions to the Inverse Kinematics problem,
developed in the context of robot manipulators (robotic
arms).Before we proceed with some
simple inverse kinematics examples, note that inverse kinematics is
the inverse of the forward kinematics problem. Therefore, an
immediate attempt to solve the inverse kinematics problem would be
by inverting forward kinematics equations.
Let's illustrate how to solve the inverse
kinematics problem for robot manipulators on a simple example. The
figure below shows a simple planar robot with two arms. The
underlying degrees of freedom of this robot are the two angles
dictating the rotation of the arms. These are labeled in the
figure below as θ1 and θ2. The inverse kinematics question in this case would
be:
What are the values for the degrees of freedom so that the end
effector of this robot (the tip of the last arm) lies at position
(x,y) in the two-dimensional Cartesian space? One straightforward approach to solving the problem is to try to write down the
forward kinematics equations that relate (x,y) to the two rotational
degrees of freedom (see Forward Kinematics for details on how to do so), then try to solve these equations. The solutions
will give you an answer to the inverse kinematics problem for this
robot.
Given an (x, y) target position for the end-effector of a robot with only two degrees of freedom θ1 and θ2, what are the solutions for θ1 and θ2?
You can compare your answer with the derivation steps below.
You can see that there can be 0, 1, or 2 solutions for this example.
Where does the non-uniqueness of the solutions lie in the answers we
derive?
As it can be seen in the example above, the solutions to an inverse kinematics problem are not necessarily unique.
In fact, as the number of degrees of freedom increases, so does the maximum number of
solutions, as depicted in the figure.
It is also possible for a problem to have no solution if the point on the robot cannot be brought to the target point in space at all.While the above example offers equations that are easy to solve, general inverse kinematics problems
require solving systems of nonlinear equations for which there are no general
algorithms. Some inverse kinematics problems cannot be solved analytically. In robotics, it is sometimes possible to design systems to have solvable inverse kinematics, but in the general case, we must rely on approximation methods in order to keep the problem tractable, or, in some cases, even solvable. For examples on how to address inverse kinematics in particular robotic systems, please read chapter 4 of . An illustration of the solutions of the inverse kinematics problem for a robot which is widely used in industry is shown below.
Inverse Kinematics Methods Inverse kinematics methods are categorized into two main groups:
exact,
classic, or algebraic methods heuristic, or optimization methods
While
exact methods are complete, i.e. they report all
solutions, they can only find solutions for chains with up to nine degrees of
freedom. Hierarchical approaches break long chains into smaller ones for
which exact methods provide answers. More powerful methods, referred to as
optimization or heuristic methods, though not complete, are unrestricted in the number of degrees of freedom in the systems about which they reason.
Classic Inverse Kinematics MethodsIt is known that for manipulators with no more than six degrees of freedom, there is
a finite number of solutions to the inverse kinematics problem . There
is, however, no analytical method that can find these solutions for
all types of manipulators. For manipulators with only revolute joints,
which is the case for biomolecules with idealized geometry, the
number of unique solutions is at most 16, when the number of degrees of freedom
does not exceed six .
An efficient solution was proposed in
and later applied to the conformational analysis of small molecular
chains , .
Methods based on curve approximation were proposed
in for the inverse kinematics of hyper-redundant robots,
where the number of regularly distributed joints is very large.
Specialized solutions to inverse kinematics in biochemistry appeared as
early as 1970 , where fragments of up to 6
degrees of freedom were predicted by solving a set of polynomial
equations representing geometric transformations. These equations were
applied to building tripeptide loops
under the ideal geometry assumption. Later work ,
, ,
offered efficient analytical solutions for three
consecutive residues through spherical geometry and polynomial equations. Bounding
inverse kinematic solutions for chains with no more than six degrees of freedom
within small intervals has also been shown relevant in the context of drug
design . A new formulation that extends the
domain of solutions to any three residues, not necessarily consecutive,
and with arbitrary geometry, was recently proposed .
Current work that pushes the dimensionality limit from six to nine degrees of freedom makes use of an efficient
subdivision of the solution space .
Inverse Kinematics Methods with OptimizationCurrently, optimization-based solutions are considered most appropriate
for accommodating chains with arbitrary numbers of degrees of freedom. Two
well-known optimization-based inverse kinematics solutions that
iteratively solve a system of equations until loops are closed are
Random Tweak ,
and Cyclic Coordinate Descent (CCD)
, ,
.
Both methods are based on iteratively setting the dihedral degrees of freedom of a fragment or kinematic chain
until the end effector (atom for a protein) reaches a target position.
Random tweak relies on the computation of the Jacobian (a high-dimensional analog of the derivative of a function on real numbers), a process that is
computationally expensive and numerically
unstable. In addition to not being free
from mathematical singularities, random tweak does not allow additional constraints
on individual residues because modifications to dihedral angles are
introduced all at once, with a strong dependence of each dihedral
proposed change on all the others. Additional constraints on the
dihedrals may result in the unpredictable motion of a feature atom
away from rather than toward its target position.
Avoiding the use of the Jacobian, CCD is computationally inexpensive,
numerically stable, and free from singularities. CCD
avoids the interdependence of dihedral angles by adjusting only a single degree of freedom
at a time. This allows for additional constraints on dihedral angles with
a predictable motion of the end effector towards the target
position. First introduced in the context of non-linear
programming , CCD was found applications in
the robotics community, and later in the structural biology community in the context of the loop closure
problem for proteins.
Cyclic Coordinate Descent and Its Application to Proteins
CCD tries to find an optimal angle by which to
rotate a single bond so as to steer a desired atom towards its
target position. When applying CCD to find dihedral angles of a
fragment of the polypeptide chain so that the ends of the fragment
connect properly with the rest of the chain, it is important to
steer not just one atom of the end of the fragment, but the three
backbone atoms of the end simultaneously. Finding values to the
dihedral angles that steer the three backbone atoms of the end of
the fragment simultaneously to their target positions guarantees
that the end of the fragment will assume both its target position
and orientation in space. We will explain how to find optimal values
to the dihedral angles of a fragment by which to simultaneously
steer the three backbone atoms of the end of the fragment to their
target positions. We first define their current positions M and
their target positions F, as shown in the figure below. The goal is
to minimize the Euclidean distance between the current and the
target positions for all three atoms simultaneously. In order to find the optimal angle by which to rotate a particular bond, we can define an objective function S that we wish to minimize. We propose a value for S that sums the square of the deviations between the final position of the atoms after the rotation (M) and the desired positions (F). Using this nomenclature, the squared norm of the vector M-F (denoted FM) has precisely this value for each of the three atoms, so we can sum the three contributions to S. The FM vectors can be defined relative to an origin O located along the axis of rotation, which will simplify the math, since the rotation is two-dimensional when working on the plane perpendicular to the axis of rotation. O can be computed by projecting the current position of the atom, M, onto the rotation axis. It is convenient to decompose OM for each feature atom into two components (along the r and s local axes), in order to allow its expression in terms of the angle being rotated (using cosine and sine). This way, the distance between the atoms and their target positions will be only a function of the fixed (rotatable bond) atoms and the angle to rotate, which remains the only variable and the problem can be solved.
The question then becomes that of finding a rotation along the
rotation axis O, shown in the figure, that minimizes S. First, we need
to define S as a function of the angle we are trying to find. Doing so
is not hard, since rotation by this angle is a two-dimensional
rotation. In the figure above we have shown how the position of an
atom can be updated through a two-dimensional rotation by the angle
around the rotation axis. In this way we obtain an expression that
relates S to the angle we want to find. Since this angle has to
minimize S, it has to provide a solution to the first order derivative
of S set to 0. This is shown below in Figure 5. Simplifying the
expression for the first order derivative of S set to 0 gives us a
formula for tan(α)
. CCD is a very efficient method due to the fact that it
obtains the value for α analytically. However, an expression for the
tangent does not provide a unique value for the angle. This is a
consequence of the fact that the derivative of S set to 0 corresponds
not only to minima, but also to local maxima. In order to find the
angle that indeed minimizes S, one would have to make sure that the
second order derivative of S is greater than 0. This is more
cumbersome. There is a way to avoid doing such calculations by
realizing that the formula we received for S in terms of the angle we
want to solve for, can be rewritten as shown in Figure 5. In this way,
one can obtain a value for both the cosine and the sine of the angle,
which now uniquely determines the optimal angle.
Unlike classic inverse-kinematics solutions
that use Jacobian matrices , , or general numerical approaches, CCD is free of
singularities and does not depend on initial guesses for
solutions. Compared to inverse kinematics techniques with
optimization that suffer from high computational times, CCD is
computationally fast. Unlike other methods such as random tweak, CCD
gives predictable behavior and suffers from no anomalies when
additional constraints are added to the dihedrals (e.g. constraints
imposed by the physical-chemical forces on the protein). Such
properties make CCD very appealing.
Because CCD solves for the degrees of freedom
of a chain one at a time, the method finds the optimal values for all
the degrees of freedom of the chain iteratively, according to some
order. CCD iterates over the degrees of freedom according to a
predetermined order (e.g. a straightforward implementation of the
method may use the identity order, where degrees of freedom are
numbered consecutively in increasing order from the base to the end
effector of the chain), solving for each one of them at a time. This
process of iterating over all the degrees of freedom can be repeated
a maximum number of times or until the end effector lies within an
epsilon distance of its position and orientation in space.
While not able to enumerate all the
solutions to the degrees of freedom, CCD guarantees it will find a
solution if one exists. Given a configuration of the chain and a
target position and orientation for the chain's end effector, CCD
iteratively modifies the degrees of freedom of the chain until either
it runs out of iterations or it manages to satisfy the spatial
constraint on the end effector. Due to its computational efficiency
(linear time complexity on the number of degrees of freedom of the
chain), CCD has been applied to determine atomic positions of missing
mobile loops of arbitrary length. A
similar application complete missing loops in partially resolved
crystallographic structures can be found in , , . A recent application of CCD to not just
loops but any fragment of a protein polypeptide chain can be found
in. The work in applies CCD to configurations of a
fragment that are sampled uniformly at random to obtain an ensemble
of fragment configurations that connects with the rest of the protein
polypeptide chain. Careful attention is paid to the energetic
refinement of the obtained fragment configurations in order to ensure
the biological relevance of the configurations at room
temperature. The usage of CCD in to obtain an ensemble of biologically
meaningful configurations of a fragment of the polypeptide chain is
an interesting application to capture the flexibility of a fragment
in the context of a given protein structure. By generating ensembles
of biologically relevant configurations for fragments that are
defined consecutively and with significant overlap over the protein
polypeptide chain, the work in
offers a novel approach to capture the flexibility of the entire
polypeptide chain.
Recommended Reading Canutescu, A.A. and R.L. Dunbrack. [PDF] . Cyclic Coordinate Descent: A Robotics Algorithm for Protein Loop Closure. Protein Science, 12:963-72, 2003. Craig, J.J. Introduction to Robotics, chapter 4. Reading, MA: Addison-Wesley, 1989.
van den Bedem, H. and Lotan, I. and Deacon, A. M. and Latombe,
J.-C.. [PDF]
. Computing protein structures from electron density maps: the
missing loop problem. Algorithmic Foundations of Robotics VI, 345-360,
2005. Shehu, A. and Clementi, C. and Kavraki, L.E. [PDF] . Modeling Protein Conformational Ensembles: From Missing Loops to Equilibrium Fluctuations. Proteins: Structure, Function, Bioinformatics, 65(1):164-179, 2006.Coutsias, E. A. and Seok, C. and Wester, M. J. and Dill, K. A. [LINK]. Resultants and loop closure. International Journal of Quantum Chemistry, 106(1):176-189,2005. Craig, J. J. Introduction to Robotics Addison-Wesley1989Reading, MA M. Zhang, and L. E. Kavraki A New Method for Fast and Accurate Derivation of Molecular Conformations
Journal of Chemical Information and Computer Sciences 2002 42 64-70 http://pubs.acs.org/cgi-bin/article.cgi/jcisd8/2002/42/i01/pdf/ci010327z.pdf
Luenberger, D. G. Linear and Nonlinear Programming Addison-Wesley1984 L. T. Wang, and C. C. Chen A combined optimization method for solving the inverse kinematics problem of
mechanical manipulators
IEEE Transactions on Robotics and Automation 1991 7 489-499 http://ieeexplore.ieee.org/search/advsearch.jsp
A. A. Canutescu, and R. L. Dunbrack Cyclic Coordinate Descent: A Robotics Algorithm for Protein Loop Closure
Protein Science 2003 12 963-972 http://www.proteinscience.org/cgi/reprint/12/5/963
M. Raghavan, and B. Roth Kinematic analysis of the 6R manipulator of general geometry
International Symposium on Robotics Research 1989 http://ieeexplore.ieee.org/search/advsearch.jsp
G. S. Chirikjian General methods for computing hyper-redundant manipulator inverse
kinematics
IEEE/RSJ International Conference on Intelligent Robots and Systems
1993 http://caesar.me.jhu.edu/publication/hyperR_manipulators.html
D. Manocha, and J. Canny Efficient inverse kinematics for general 6R manipulator
IEEE Transactions on Robotics and Automation 1994 10 648-657 ftp://ftp.cs.unc.edu/pub/users/manocha/PAPERS/MOLECULE/report.pdf
D. Manocha, and Y. Zhu Kinematic manipulation of molecular chains subject to rigid constraints
International Conference on Intelligent Systems for Molecular Biology
1994 2 285-293 ftp://ftp.cs.unc.edu/pub/users/manocha/PAPERS/MOLECULE/ismb.pdf
D. Manocha, and Y. Zhu, and W. Wright Conformational analysis of molecular chains using nano-kinematics
Computer Applications in Biosciences 1995 11 71-86 ftp://ftp.cs.unc.edu/pub/users/manocha/PAPERS/MOLECULE/main.pdf
M. Zhang, and L. E. Kavraki Solving Molecular Inverse Kinematics Problems for Protein Folding and Drug
Design
Currents in Computational Molecular Biology 2002 214–215 http://www.cs.rice.edu/CS/Robotics/papers/zhang2002/solve-molec-inv-kinemat.pdf
N. Go, and H. J. Scheraga Ring closure and local conformational deformations of chain molecules
Macromolecules 1970 3 178-187 http://pubs.acs.org/cgi-bin/sample.cgi/mamobx/1970/3/i02/pdf/ma60014a012.pdf
K. A. Palmer, and H. J. Scheraga Standard-geometry chains fitted to x-ray derived structures : Validation of the
rigid-geometry approximation. i. chain closure through a limited search of loop
conformations
Journal Computational Chemistry 1991 12 505-526 http://doi.wiley.com/10.1002/jcc.540130307
W. J. Wedemeyer, and H. J. Scheraga Exact analytical loop closure in proteins using polynomial equations
Journal Computational Chemistry 1999 20 819-844 http://www3.interscience.wiley.com/cgi-bin/fulltext/61004552/PDFSTART
R. E. Bruccoleri, and M. Karplus Chain closure with bond angle variations
Macromolecules 1985 18 2676-2773 http://pubs.acs.org/cgi-bin/sample.cgi/mamobx/1985/18/i12/pdf/ma00154a069.pdf.pdf
E. Coutsias, and C. Seok, and M. Jacobson, and K. Dill A Kinematic View of Loop Closure
Journal of Computational Chemistry 2004 25 510-528
http://www3.interscience.wiley.com/cgi-bin/fulltext/107061300/PDFSTART
M. Zhang, and R. A. White, and L. Wang, and R. Goldman, and L. E. Kavraki,
and B. Hasset Improving Conformational Searches by Geometric Screening
Journal of Bioinformatics 2004 7 624-630 http://bioinformatics.oupjournals.org/cgi/reprint/21/5/624
R. M. Fine, and H. J. Wang, and P. S. Shenkin, and D. L.
Yarmush, and C. Levinthal Predicting antibody hypervariable loop conformations. II: Minimization and
molecular dynamics studies of MCPC603 from many randomly generated loop
conformations
Proteins 1986 1 342-362 http://www3.interscience.wiley.com/cgi-bin/fulltext/107611345/PDFSTART
P. S. Shenkin, and D. L. Yarmush, and R. M. Fine, and H. J. Wang,
and C. Levinthal Predicting antibody hypervariable loop conformations. I: Ensembles of random
conformations for ring-like structures
Biopolymers 1987 26 2053-2085 http://www3.interscience.wiley.com/cgi-bin/fulltext/107588501/PDFSTART
H. van den Bedem, and I. Lotan, and J.-C. Latombe, and A. Deacon Real-Space Protein-Model Completion: an Inverse-Kinematics Approach
Acta Crystallographica 2005 D61 2-13 http://www.blackwell-synergy.com/doi/full/10.1107/S0907444904025697
I. Lotan Algorithms exploiting the chain structure of proteins Ph.D. thesis, Stanford University 2004 http://www-cs-students.stanford.edu/~itayl/mythesis.pdf H. van den Bedem, and I. Lotan, and A. Deacon, J.-C. Latombe Computing protein structures from electron density maps: the missing loop problem
Algorithmic Foundations of Robotics VI 2005 345-360 http://www-cs-students.stanford.edu/~itayl/wafr.pdf
A. Shehu, and C. Clementi, and L. E. Kavraki Modeling Protein Conformational Ensembles:
From Missing Loops to Equilibrium Fluctuations
Proteins: Structure, Function, and Bioinformatics 2006 65(1) 164-179 http://www3.interscience.wiley.com/cgi-bin/fulltext/112752527/PDFSTART