<< Chapter < Page | Chapter >> Page > |
Figure 12 shows the results of applying Isomap to the same molecular trajectory used for the CV-N example in the PCA section. CV-N is a protein that is known to have an "intermediate" state in the folding mechanism, and it is also known to fold through a very ordered process following a well-defined "folding route".
The free-energy plot in figure 12 (right) clearly shows the superior quality of the Isomap coordinates when compared with PCA. Isomap clearly identifies the CV-N intermediate (seen as a free-energy minimum in blue) in between the unfolded and folded states. Also, the highest probability route connecting the intermediate to the folded state is clearly seen in this rendering. The PCA coordinates did not identify an itermediate or a folding route, since the PCA coordinates are simply a linear projection of the original coordinates. Isomap, on the contrary, finds the underlying connectivity of the data and always returns coordinates that clearly show the progression of the reaction along its different stages.
The Isomap procedure can be applied to different molecular models. As another, smaller example, consider the Alanine Dipeptide molecule shown in figure 13 (left). This is an all-atom molecule that has been studied thoroughly and is known to have two predominant shapes: an "extended" shape (also called C5 and a variant called PII) and a "helical" shape (also called alpha, with three variants: right-handed, P-like, and left-handed). The application of the Isomap algorithm to this molecule, without any bias by a priori known information, yields the low-dimensional coordinates on the left of figure 13 (shown again as a free-energy plot). The first two coordinates (top right) are already enough to identify all five major states of the peptide, and the possible transitions between them. Applying PCA to such a simple molecule yields comparable results (not shown), but PCA cannot differentiate the left-handed helix shape from the right-handed one. Only the geodesic formulation of Isomap can.
Although the Isomap coordinates describe non-linear molecular processes extremely well, their computation is very expensive, as mentioned earlier. The most expensive step is the computation of the neighborhood graph. Some approximations are possible to speed up its computation, at the expense of some accuracy in the identification of the "true" nearest neighbors for each point. The work in Plaku et al. proposed the use of an approximation technique that allows the computation of nearest-neighbors for high-dimensional data sets orders of magnitude faster, by quickly reducing the dimensionality of all points, and then querying for neighbors. The tradeoff between precision and speed is controllable through several algorithm parameters.
Notification Switch
Would you like to follow the 'Geometric methods in structural computational biology' conversation and receive update notifications?