<< Chapter < Page | Chapter >> Page > |
As for the case of rotation matrices, the translational part of the alignment consists of making the centroids of the two data sets coincide. To find the optimal rotation using quaternions, recall that the dot product of two vectors is maximized when the vectors are in the same direction. The same is true when the vectors are represented as quaternions. Using this property, we can define a quantity that we want to maximize (proof here ): Equivalently, using the last property from the section "Introduction to quaternions", we get: Now, recall that quaternion multiplication can be represented by matrices, and that the quaterions a and b have a 0 real component: Using these matrices, we can derive a new form for the objective function: where: The quaternion that maximizes this product is the eigenvector of N that corresponds to its most positive eigenvalue (proof here ). The eigenvalues can be found by solving the following equation, which is quartic in lambda: This quartic equation can be solved by a number of standard approaches. Finally, given the maximum eigenvalue lambda-max, the quaternion corresponding to the optimal rotation is the eigenvector v: A closed-form solution to this equation for v can be found by applying techniques from linear algebra. One possible algorithm, based on constructing a matrix of cofactors, is presented in appendix A5 of the source paper [3] .
In summary, the alignment algorithm works as follows:
This method appears computationally intensive, but has the major advantage over other approaches of being a closed-form, unique solution.
RMSD and lRMSD are not ideally suited for all applications. For example, consider the case of a given conformation A, and a set S of other conformations generated by some means. The goal is to estimate which conformations in S are closest in potential energy to A, making the assumption that they will be the conformations most structurally similar to A. The lRMSD measure will find the conformations in which the overall average atomic displacement is least. The problem is that if the quantity of interest is the potential energy of conformations, not all atoms can be treated equally. Those on the outside of the protein can often move a fair amount without dramatically affecting the energy. In contrast, the core of the molecule tends to be more compact, and therefore a slight change in the relative positions of a pair of atoms could lead to overlap of the atoms, and therefore a completely infeasible structure and high potential energy. A class of distance measures and pseudo-measures based on intramolecular distances have been developed to address this shortcoming of RMSD-based measures.
Assume we wish to compare two conformations P and Q of a molecule with N atoms. Let be the distance between atom i and atom j in conformation P, and let be the same distance for conformation Q. Then the intramolecular distance is defined as One of the main computational advantages of this class of approaches is that we do not have to compute the alignment between P and Q. On the other hand, for this metric we need to sum over aquadratic number of terms, whereas for RMSD the number of terms is linear in the number of atoms. Approximations can be made to speed up this computation, as shown in [7] . Also, the intramolecular distance measure given above, which is sometimes referred to as the dRMSD, is subject to the problem that pairs of atoms most distant from each other are the ones that contribute the greatest amount to their measured difference.
An interesting open problem is to come up with physically meaningful molecular distance metric that allows for fast nearestneighbor computations. This can be useful for, for example, clustering conformations. One proposed method is the contact distance . Contact distance requires constructing a contact map matrix for each conformation indicating which pairs of atoms are less than some threshold separation. The distance measure is then a measure of the difference of the contact maps. Other distance measures attempt to weight each pair in the dRMSD based on how close the atoms are, with closer pairs given more weight, in keeping with the intuition that small changes in the relative positions of nearby atoms are more likely to result in collisions. One such measure is the normalized Holm and Sander Score . This score is technically a pseudo-measure rather than a measure because it does not necessarily obey the triangle inequality .
The definition of distance measures remains an open problem. For reference on ongoing work, see articles that compare several methods, such as [5] .
Notification Switch
Would you like to follow the 'Geometric methods in structural computational biology' conversation and receive update notifications?