# Connexions

You are here: Home » Content » Protein Inverse Kinematics and the Loop Closure Problem

### Recently Viewed

This feature requires Javascript to be enabled.

# Protein Inverse Kinematics and the Loop Closure Problem

Module by: Lydia E. Kavraki. E-mail the author

Summary: This module introduces students to the inverse kinematics of manipulators and its relevance to problems in computational structural biology.

Note: You are viewing an old version of this document. The latest version is available here.

## Background Material

The math involved in solving inverse kinematics requires some background in linear algebra, specifically in the anatomy and application of transformation matrices. The links provided in this section should provide enough mathematical background to understand the rest of this module and eventually write a simple protein manipulation program.

### Background on Transformations

• Transformation Matrices: The main transformations you will apply to polypeptide chains will be a combination of translations and rotations. Please see introduction to translations and introduction to two- and three-dimensional rotations. One special rotation matrix is the Euler matrix . Please pay particular attention to the different conventions used for defining the Euler matrix. The one we explain more in detail in this module is the X-Y-Z convention. Now that you know what an Euler matrix looks like, you need to get familiar with rotations about an arbitrary vector or line. Please read more on rotations around an arbitrary vector .
• Homogeneous Transformations: The use of homogenous coordinates and transformations can simplify some of the calculations involved in using three-dimensional transformations. See homogenous transformations .
• Quaternions While not strictly necessary, quaternions are an efficient, robust method of representing three-dimensional rotations. In particular, they are not subject to the undesirable singularities and numerical instability of rotations represented by Euler angles. Please visit this introduction to quaternions to see how they relate to homogenous transformations.
• Applying Transformations Transformations allow us to manipulate long chains such as polypeptides. Chaining transformation matrices allows the transformation of an entire chain. Please read the following on Forward Kinematics .
A more detailed discussion of spatial descriptions and transformations can be found in chapter 2 of [1].

## The Loop Closure Problem

Many globular proteins have a relatively stable, inflexible core region, consisting of tightly arranged secondary structure elements, and are less compact and more flexible at the surface, where, among other effects, solvent molecules may cause loose chains and loops to swing freely. One consequence of this flexibility is that experimental structure determination methods may have difficulty resolving the positions of surface loops of the protein chain. The positions of the atoms in those loops may be so incosistent that no single position relative to the core dominates, and so their positions cannot be determined by experiment.

When this happens, we are left with partial protein structures, with segments of the protein chain missing. We know the amino acid sequence of the missing sequence, and where its endpoints join the known part of the structure. Given this information, we would like to be able to reconstruct feasible structures for the missing segment such that the endpoints join to the known structure at the proper points, and such that all constraints of bond lengths and angles are satisfied. This is an instance of the Loop Closure Problem, and it is intimately related to the inverse kinematics problem in robotics.

## Inverse Kinematics

Inverse kinematics is the problem of finding the right values for the underlying degrees of freedom, in the case of proteins, of the dihedral angles, so that the manipulated chain satisfies certain spatial constraints. For example, in some applications, it is necessary to find rotations that can steer certain atoms to desired locations in space. To achieve a particular function, protein regions sometimes have to undergo concerted motion where atoms move together in order to locate themselves near another protein or molecule. The motion of atoms is spatially constrained because they have to assume specific target locations in space. However, since atoms must move together in order not to break bonds by their motion, it is easier to model their motion in dihedral angle space, where bond lengths and bond angles are fixed. This parameterization of protein motion, called the idealized or rigid geometry model, is discussed in Representing Proteins in silico: Data Structures and Kinematics.

In applying inverse kinematics algorithms to proteins, we are taking advantage of a striking similarity between organic molecules and robotic manipulators (robot arms) in terms of how they move. As robot manipulators have joints, proteins have atoms. As robot manipulators have links that connect their joints, proteins have bonds that connect their atoms. The similarity between proteins and robots makes it possible for us to apply to proteins a large existing literature of solutions to the Inverse Kinematics problem, developed in the context of robot manipulators (robotic arms).

Before we proceed with some simple inverse kinematics examples, note that inverse kinematics is the inverse of the forward kinematics problem. Therefore, an immediate attempt to solve the inverse kinematics problem would be by inverting forward kinematics equations.

Let's illustrate how to solve the inverse kinematics problem for robot manipulators on a simple example. The figure below shows a simple planar robot with two arms. The underlying degrees of freedom of this robot are the two angles dictating the rotation of the arms. These are illustrated in the figure below. The inverse kinematics question in this case would be: What are the values for the degrees of freedom so that the end effector of this robot (the tip of the last arm) lies at position (x,y) in the two-dimensional Cartesian space? Try to write down the forward kinematics equations that relate (x,y) to the two rotational degrees of freedom, then try to solve these equations. The solutions will give you an answer to the inverse kinematics problem for this robot.

### Exercise 1

Given (x, y) target position for end-effector of the robot, what are the solutions for the two rotational (θ) degrees of freedom?

#### Solution

You can see that there can be 0, 1, or 2 solutions for this example. Where does the non-uniqueness of the solutions lie in the answers we derive?

As you saw from this example, the solutions to an inverse kinematics problem are not necessarily unique. As the number of degrees of freedom increases, so does the maximum number of solutions, as depicted in the figure. It is also possible for a problem to have no solution if the point on the robot cannot be brought to the target point in space at all.

While the above example offers equations that are easy to solve, general inverse kinematics problems require solving systems of nonlinear equations for which there are no general algorithms. Some inverse kinematics problems cannot be solved analytically because mathematical singularities (such as undefined sines and cosines, or division by 0) arise during the computation. In robotics, it is sometimes possible to design systems to have solvable inverse kinematics, but in the general case, we must rely on approximation methods in order to keep the problem tractable, or, in some cases, even solvable. For examples on how to address inverse kinematics in particular systems, please read chapter 4 of [1].

### Inverse Kinematics Methods

Inverse kinematics methods are categorized into two main groups:

• exact, classic, or algebraic methods
• heuristic, or optimized methods
While exact methods are complete, i.e. they report all solutions, they can only find solutions for chains with up to nine degrees of freedom. Hierarchical approaches break long chains into smaller ones for which exact methods provide answers. More powerful methods, referred to as optimized or heuristic methods, though not complete, are unrestricted in the number of degrees of freedom in the systems about which they reason. When combined with conformational search algorithms these methods can be quite powerful for modeling proteins.

#### Classic Inverse Kinematics Methods

It is known that for manipulators with no more than six degrees of freedom, there is a finite number of solutions to the inverse kinematics problem [1]. There is, however, no analytical method that can find these solutions for all types of manipulators. For manipulators with only revolute joints, which is the case for biomolecules with idealized geometry, the number of unique solutions is at most 16, when the number of degrees of freedom does not exceed six [6]. An efficient solution was proposed in [8] and later applied to the conformational analysis of small molecular chains [9], [10]. Methods based on curve approximation were proposed in [7] for the inverse kinematics of hyper-redundant robots, where the number of regularly distributed joints is very large.

Specialized solutions to inverse kinematics in biochemistry appeared as early as 1970 [12], where fragments of up to 6 degrees of freedom were predicted by solving a set of polynomial equations representing geometric transformations. These equations were applied to building tripeptide loops [12] under the ideal geometry assumption. Later work [15], [13], [10], [14] offered efficient analytical solutions for three consecutive residues through spherical geometry and polynomial equations. Bounding inverse kinematic solutions for chains with no more than six degrees of freedom within small intervals was applied in the context of drug design [11]. A new formulation that extends the domain of solutions to any three residues, not necessarily consecutive, and with arbitrary geometry, was recently proposed [16]. Current work that pushes the dimensionality limit from six to nine degrees of freedom makes use of an efficient subdivision of the solution space [17].

#### Inverse Kinematics Methods with Optimization

Currently only optimization-based solutions are considered appropriate for accommodating chains with arbitrary numbers of degrees of freedom. Two well-known optimization-based inverse kinematics solutions that iteratively solve a system of equations until loops are closed are Random Tweak [18], [19] and Cyclic Coordinate Descent (CCD) [3], [4], [5]. Both methods are based on iteratively setting the dihedral degrees of freedom of a fragment or kinematic chain until the end effector (atom for a protein) reaches a target position.

Random tweak relies on the computation of the Jacobian (a high-dimensional analog of the derivative of a function on real numbers), a process that is computationally expensive and numerically unstable. In addition to not being free from mathematical singularities, random tweak does not allow additional constraints on individual residues because modifications to dihedral angles are introduced all at once, with a strong dependence of each dihedral proposed change on all the others. Additional constraints on the dihedrals may result in the unpredictable motion of a feature atom away from rather than toward its target position.

Avoiding the use of the Jacobian, CCD is computationally inexpensive, numerically stable, and free from singularities. This method avoids the interdependence of dihedral angles by adjusting only a single degree of freedom at a time, which allows for additional constraints on dihedral angles with a predictable motion of feature atoms towards their target positions. First introduced in the context of non-linear programming [3] , this method was applied to robotics [4], and later was applied to the loop closure problem for proteins[5].

### Cyclic Coordinate Descent and Its Application to Proteins

CCD tries to find an optimal angle by which to rotate a single bond so as to steer a desired atom towards its target position. We will explain a solution where we will try to find an optimal angle by which to steer three atoms simultaneously. We first define their current positions M and their target positions F, as shown in the figure below. The goal is to minimize the Euclidean distance between the current and the target positions for all three atoms simultaneously. That is why we define an objective function S that sums the squared distances per atom.

The question then becomes that of finding a rotation along the rotation axis O, shown in the figure, that minimizes S. First, we need to define S as a function of the angle we are trying to find. Doing so is not hard, since rotation by this angle is a two-dimensional rotation. In the figure above we have shown how the position of an atom can be updated through a two-dimensional rotation by the angle around the rotation axis. In this way we obtain an expression that relates S to the angle we want to find. Since this angle has to minimize S, it has to provide a solution to the first order derivative of S set to 0. This is shown below. Simplifying the expression for the first order derivative of S set to 0 gives us a formula for  tan(αα) . However, an expression for the tangent does not provide a unique value for the angle. This is a consequence of the fact that the derivative of S set to 0 corresponds not only to minima, but also to local maxima. In order to find the angle that indeed minimizes S, one would have to make sure that the second order derivative of S is greater than 0. This is more cumbersome. There is a way to avoid doing such calculations by realizing that the formula we received for S in terms of the angle we want to solve for, can be rewritten as shown below. In this way, one can obtain a value for both the cosine and the sine of the angle, which now uniquely determines the optimal angle. While not able to enumerate all solutions in dihedral space, CCD guarantees it will find some solution if one exists. Most recently CCD has been applied to long protein chains to determine the structure of missing fragments[5]. Given a dihedral bond on the kinematic chain from a given anchor atom to the designated feature atom, CCD finds a value by which to rotate the given bond so that the feature atom's updated position gets closer to its target position.

Unlike classic inverse-kinematics solutions that use Jacobian matrices [18], [19], or general numerical approaches, CCD is free of singularities and does not depend on initial guesses for solutions. Compared to inverse kinematics techniques with optimization that suffer from high computational times, CCD is computationally fast. Unlike other methods such as random tweak, CCD gives predictable behavior and suffers from no anomalies when additional constraints are added to the dihedrals (e.g. constraints imposed by the physico-chemical forces on the protein). Such properties make CCD very appealing.

When solving for long polypeptide chains, one needs to define the set of rotatable dihedrals that can be solved for through CCD. These dihedrals can then be used to rotate and therefore steer the desired atom towards its target position. A straighforward implementation proceeds down the rotatable dihedral angles according to a predetermined order finding maximum angle values by which to rotate the current dihedral so that the desired atom gets as close as possible to its target position. Once such a value is computed, one can update the conformation of the polypeptide chain through forward kinematics. To learn more about the application of CCD to resolving partial protein structures, such as loop closure, please refer to [5]. Another interesting application of CCD to complete partially resolved crystallograhic data is Itay Lotan's thesis .

## References

2. M. Zhang, and L. E. Kavraki. (2002). A New Method for Fast and Accurate Derivation of Molecular Conformations. [http://pubs.acs.org/cgi-bin/article.cgi/jcisd8/2002/42/i01/pdf/ci010327z.pdf]. Journal of Chemical Information and Computer Sciences, 42, 64-70.
3. Luenberger, D. G. (1984). Linear and Nonlinear Programming. Addison-Wesley.
4. L. T. Wang, and C. C. Chen. (1991). A combined optimization method for solving the inverse kinematics problem of mechanical manipulators. [http://ieeexplore.ieee.org/search/advsearch.jsp]. IEEE Transactions on Robotics and Automation, 7, 489-499.
5. A. A. Canutescu, and R. L. Dunbrack. (2003). Cyclic Coordinate Descent: A Robotics Algorithm for Protein Loop Closure. [http://www.proteinscience.org/cgi/reprint/12/5/963]. Protein Science, 12, 963-972.
6. M. Raghavan, and B. Roth. (1989). Kinematic analysis of the 6R manipulator of general geometry. [http://ieeexplore.ieee.org/search/advsearch.jsp]. International Symposium on Robotics Research.
7. G. S. Chirikjian. (1993). General methods for computing hyper-redundant manipulator inverse kinematics. [http://caesar.me.jhu.edu/publication/hyperR_manipulators.html]. IEEE/RSJ International Conference on Intelligent Robots and Systems.
8. D. Manocha, and J. Canny. (1994). Efficient inverse kinematics for general 6R manipulator. [ftp://ftp.cs.unc.edu/pub/users/manocha/PAPERS/MOLECULE/report.pdf]. IEEE Transactions on Robotics and Automation, 10, 648-657.
9. D. Manocha, and Y. Zhu. (1994). Kinematic manipulation of molecular chains subject to rigid constraints. [ftp://ftp.cs.unc.edu/pub/users/manocha/PAPERS/MOLECULE/ismb.pdf]. International Conference on Intelligent Systems for Molecular Biology, 2, 285-293.
10. D. Manocha, and Y. Zhu, and W. Wright. (1995). Conformational analysis of molecular chains using nano-kinematics. [ftp://ftp.cs.unc.edu/pub/users/manocha/PAPERS/MOLECULE/main.pdf]. Computer Applications in Biosciences, 11, 71-86.
11. M. Zhang, and L. E. Kavraki. (2002). Solving Molecular Inverse Kinematics Problems for Protein Folding and Drug Design. [http://www.cs.rice.edu/CS/Robotics/papers/zhang2002/solve-molec-inv-kinemat.pdf]. Currents in Computational Molecular Biology, 214–215.
12. N. Go, and H. J. Scheraga. (1970). Ring closure and local conformational deformations of chain molecules. [http://pubs.acs.org/cgi-bin/sample.cgi/mamobx/1970/3/i02/pdf/ma60014a012.pdf]. Macromolecules, 3, 178-187.
13. K. A. Palmer, and H. J. Scheraga. (1991). Standard-geometry chains fitted to x-ray derived structures : Validation of the rigid-geometry approximation. i. chain closure through a limited search of loop conformations. [http://doi.wiley.com/10.1002/jcc.540130307]. Journal Computational Chemistry, 12, 505-526.
14. W. J. Wedemeyer, and H. J. Scheraga. (1999). Exact analytical loop closure in proteins using polynomial equations. [http://www3.interscience.wiley.com/cgi-bin/fulltext/61004552/PDFSTART]. Journal Computational Chemistry, 20, 819-844.
15. R. E. Bruccoleri, and M. Karplus. (1985). Chain closure with bond angle variations. [http://pubs.acs.org/cgi-bin/sample.cgi/mamobx/1985/18/i12/pdf/ma00154a069.pdf.pdf]. Macromolecules, 18, 2676-2773.
16. E. Coutsias, and C. Seok, and M. Jacobson, and K. Dill. (2004). A Kinematic View of Loop Closure. [http://www3.interscience.wiley.com/cgi-bin/fulltext/107061300/PDFSTART]. Journal of Computational Chemistry, 25, 510-528.
17. M. Zhang, and R. A. White, and L. Wang, and R. Goldman, and L. E. Kavraki, and B. Hasset. (2004). Improving Conformational Searches by Geometric Screening. [http://bioinformatics.oupjournals.org/cgi/reprint/21/5/624]. Journal of Bioinformatics, 7, 624-630.
18. R. M. Fine, and H. J. Wang, and P. S. Shenkin, and D. L. Yarmush, and C. Levinthal. (1986). Predicting antibody hypervariable loop conformations. II: Minimization and molecular dynamics studies of MCPC603 from many randomly generated loop conformations. [http://www3.interscience.wiley.com/cgi-bin/fulltext/107611345/PDFSTART]. Proteins, 1, 342-362.
19. P. S. Shenkin, and D. L. Yarmush, and R. M. Fine, and H. J. Wang, and C. Levinthal. (1987). Predicting antibody hypervariable loop conformations. I: Ensembles of random conformations for ring-like structures. [http://www3.interscience.wiley.com/cgi-bin/fulltext/107588501/PDFSTART]. Biopolymers, 26, 2053-2085.

## Content actions

### Give feedback:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks