<< Chapter < Page | Chapter >> Page > |
The computations required to simulate protein motion in silico are very expensive and involve non-trivial force field evaluations, as explained above. These simulations provide us with the positions of all atoms in the molecule; for a molecule with N atoms, this amounts to 3N numbers per conformation. For interestingly sized molecules (such as proteins) the number of atoms is large and thus the dimensionality of the obtained data is extremely high. That is, a conformation sample for a protein with N atoms can be thought of as a 3N-dimensional point.
However, many biological processes are known to be very structured at the molecular level, since the constituent atoms self-organize to achieve their bio-chemical goal. An example of such a process is protein folding , the process by which a protein achieves its thermodynamically-stable three-dimensional shape to perform its biological function (a depiction of a protein folding process is shown in figure 1). To study such processes based on data gathered through simulations, there is a need to "summarize" the high-dimensional conformational data. Simply visualizing the time-series of a moving protein as produced by simulation packages does not provide a lot of insight into the process itself. One way of summarizing these motions is to turn conformations into a low-dimensional representation, such as a vector with very few components, that somehow give the "highlights" of the process. This data analysis process -turning high-dimensional data into low-dimensional data for betterinterpretation- is called dimensionality reduction and is discussed next.
When molecular shapes are sampled throughout some physical-chemical process that involves the motion of the molecule, there is a need to simplify the high-dimensional (albeit redundant) representation of a molecule given as a 3N-dimensional point, since it is believed that the actual degrees of freedom (DoFs) of the process are much less, as explained before. The resulting, simplified representation has to be useful to classify the different conformations along one or more "directions" or "axes" that provide enough discrimination between them.
Dimensionality Reduction techniques aim at analyzing a set of points, given as input, and producing the corresponding low-dimensional representation for each. The goal is to discover the true dimensionality of a data set that is only apparently high-dimensional . There exist mathematical tools to perform automatic dimensionality reduction, based on arbitrary input data in the form of high-dimensional points (not just molecules). Although different techniques achieve their goals in different ways, and have both advantages and disadvantages, the most general definition for dimensionality reduction could be stated as:
Notification Switch
Would you like to follow the 'Geometric methods in structural computational biology' conversation and receive update notifications?