<< Chapter < Page | Chapter >> Page > |
The Isomap method can be put in algorithmic form in the following way:
The Isomap algorithm captures the non-linearity of the data set automatically, from the data itself. It returns a low-dimensional projection for each point; these projections can be used to understand the underlying data distribution better. However, Isomap does not return "modes of motion" like PCA does, along which other points can be interpolated. Also, Isomap is much more expensive than PCA, since building a neighborhood graph and computing all-pairs-shortest-paths can have quadratic complexity on the number of input points. Plus, ther is the hidden cost of the distance measure itself, which can also be quite expensive. Regardless of its computational cost, Isomap has proven extremely useful in describing non-linear processes with few parameters. In particular, the work in Das et al. applied Isomap successfully to the study of a protein folding reaction.
In order to apply Isomap to a set of molecular conformations (gathered, for example, through molecular dynamics simulations) all that is needed is a distance measure between two conformations. The most popular distance measure for conformations is lRMSD, discussed in the module Molecular Distance Measures . There is no need to pre-align all the conformations in this case, since the lRMSD distance already includes pairwise alignment. Thus, the Isomap algorithm as described above can be directly applied to molecular conformations. Choosing an appropriate value for a neighborhood parameter (such as ) may require a bit of experience, though, and it may depend on the amount and variability of the data. It should be noted that now, the shape of the low-dimensional surface where we hypothesize the points lie is unknown to us. In the swiss roll example above, it was obvious that the data was sampled from a two-dimensional surface. For molecular data, however, we do not know, a priori, what the surface looks like. But we know that the process should be low-dimensional and highly non-linear in nature. Of course, the distance measure used as an underlying operation has an impact on the final coordinates as well. The validation of the Isomap method using the lRMSD distance on protein folding data was done in Das et al. , where a statistical analysis method was used to prove its usefulness.
Notification Switch
Would you like to follow the 'Geometric methods in structural computational biology' conversation and receive update notifications?