<< Chapter < Page Chapter >> Page >

As an example of the application of PCA to produce low-dimensional points for high-dimensional input, consider the Cyanovirin-N (CV-N) protein depicted in figure 4 a), corresponding to PDB code 2EZM. This protein acts as a virucidal for many viruses, such as HIV. Protein folding simulations of this protein can be used as input for a PCA analysis. A model of this protein that considers atoms at the alpha-carbon positions only has 101 such atoms, for a total of 303 degrees of freedom. Folding/unfolding simulations starting from the native PDB structure produce abundant conformation samples of CV-N along the folding reaction.

CV-N protein and PCA projections. a) A cartoon rendering of the CV-N protein. b) The projection of each simulated conformation onto the first two PCs. The concentrated cluster to the left corresponds to the "folded" protein, and the more spread cluster to the right corresponds to the "unfolded" protein. c) The PCA coordinates used as a basis for a free-energy (probability) plot.

Figure 4 b) shows the projection of each simulated conformation onto the first two principal components. Each point in the plot corresponds to exactly one of the input conformations, which have been classified along the first two PCs. Note that even the first PC alone can be used to approximately classify a conformation as being "folded" or "unfolded", and the second PC explains more of the data variability by assigning a wider spread to the unfolded region (for which the coordinate variability is much larger). The low-dimensional projections can be used as a basis to compute probability and free-energy plots, for example as in Das et al. Figure 4 c) shows such a plot for CV-N, which makes the identification of clusters easier. Using PCA to interpolate conformations along the PCs is not a good idea for protein folding data since all of the protein experiences large, non-linear motions of its atoms. Using only a few PCs would quickly destroy the protein topology when large-scale motions occur. In some cases, however, when only parts of a protein exhibit a small deviation from its equilibrium shape, the main modes of motion can be exploited effectively as the next example shows.

As an example of the application of PCA to analyze the main modes of motion, consider the work of Teodoro et al. , which worked on the HIV-1 protease, depicted in figure 5. The HIV-1 protease plays a vital role in the maturation of the HIV-1 virus by targeting amino acid sequences in the gag and gag-pol polyproteins. Cleavage of these polyproteins produces proteins that contribute to the structure of the virion, RNA packaging, and condensation of the nucleoprotein core. The active site of HIV-1 protease is formed by the homodimer interface and is capped by two identical beta-hairpin loops from each monomer, which are usually referred to as "flaps". The structure of the HIV-1 protease complexed with an inhibitor is shown in figure 5. The active site structure for the bound form is significantly different from the structure of the unbound conformation. In the bound state the flaps adopt a closed conformation acting as clamps on the bound inhibitors or substrates, whereas in the unbound conformation the flaps are more open. Molecular dynamics simulations can be run to sample the flexibility of the flaps and produce an input data set of HIV-1 conformations to which PCA can be applied. A backbone-only representation of HIV-1 has 594 atoms, which amounts to 1,782 degrees of freedom for each conformation.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Geometric methods in structural computational biology. OpenStax CNX. Jun 11, 2007 Download for free at http://cnx.org/content/col10344/1.6
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Geometric methods in structural computational biology' conversation and receive update notifications?

Ask