<< Chapter < Page Chapter >> Page >

As a very simple and abstract example of when non-linear dimensionality reduction would be preferred over a linear method such as PCA, consider the data set depicted in figure 8 a). The data is apparently two-dimensional, and naively considering the data variance in this way leads to two "important" principal components, as figure 8 b) shows. However, the data has been sampled from a one-dimensional, but non-linear, process. Figure 8 c) shows the correct interpretation of this data set as non-linear but one-dimensional.

Effects of dimensionality reduction on an inherently non-linear data set. a) The original data given as a two-dimensional set. b) PCA identifies two PCs as contributing significantly to explain the data variance. c) However, the inherent topology (connectivity) of the data helps identify the set as being one-dimensional, but non-linear.

Most interesting molecular processes share this characteristic of being low-dimensional but highly non-linear in nature. For example, in the protein folding example depicted in figure 1, the atom positions follow very complicated, curved paths to achieve the folded shape. However, the process can still be thought of as mainly one-dimensional. Linear methods such as PCA would fail to correctly identify collective modes of motion that do not destroy the protein when followed, much like in the simple example of figure 8.

Several non-linear dimensionality reduction techniques exist, that can be classified as either parametric or non-parametric.

  • Parametric methods need to be given a model to try to fit the data, in the form of a mathematical function called a kernel . Kernel PCA is a variant of PCA that projects the points onto a mathematical hypersurface provided as input together with the points. When using parametric methods, the data is forced to lie on the supplied surface, so in general this family of methods do not work well with molecular motion data, for which the non-linearity is unknown.
  • Non-parametric methods use the data itself in an attempt to infer the non-linearity from it, much like deducing the inherent connectivity in figure 8 c). The most popular methods are Isometric Feature Mapping (Isomap) from Tenenbaum et al. and Locally Linear Embedding (LLE) from Roweis et al. .
The Isomap algorithm is more intuitive and numerically stable than LLE and will be discussed next.

Isometric feature mapping (isomap)

Isomap was introduced by Tenenbaum et al. in 2000. It is based on an improvement over a previous technique known as MultiDimensional Scaling (MDS). Isomap aims at capturing the non-linearity of a data set from the set itself, by computing relationships between neighboring points. Both MDS and the non-linear improvement will be presented first, and then the Isomap algorithm will be detailed.

MDS. Multidimensional Scaling (see Cox et al. ) is a technique that produces a low-dimenisional representation for an input set of n points, where the dimensions are ordered by variance, so it is similar to PCA in this respect. However, MDS does not require coordinates as input, but rather, distances between points. More precisely, MDS requires as input a data set of points, and a distance measure d(i,j) between any pair of points x i and x j . MDS works best when the distance measure is a metric . MDS starts by computing all possible pairwise distances between the input points, and placing them in a matrix D so that D ij d(i,j) . The goal of MDS is to produce, for every point, a set of n Euclidean coordinates such that the Euclidean distance between all pairs match as close as possible the original pairwise distances D ij , as depicted in figure 9.

Questions & Answers

A golfer on a fairway is 70 m away from the green, which sits below the level of the fairway by 20 m. If the golfer hits the ball at an angle of 40° with an initial speed of 20 m/s, how close to the green does she come?
Aislinn Reply
cm
tijani
what is titration
John Reply
what is physics
Siyaka Reply
A mouse of mass 200 g falls 100 m down a vertical mine shaft and lands at the bottom with a speed of 8.0 m/s. During its fall, how much work is done on the mouse by air resistance
Jude Reply
Can you compute that for me. Ty
Jude
what is the dimension formula of energy?
David Reply
what is viscosity?
David
what is inorganic
emma Reply
what is chemistry
Youesf Reply
what is inorganic
emma
Chemistry is a branch of science that deals with the study of matter,it composition,it structure and the changes it undergoes
Adjei
please, I'm a physics student and I need help in physics
Adjanou
chemistry could also be understood like the sexual attraction/repulsion of the male and female elements. the reaction varies depending on the energy differences of each given gender. + masculine -female.
Pedro
A ball is thrown straight up.it passes a 2.0m high window 7.50 m off the ground on it path up and takes 1.30 s to go past the window.what was the ball initial velocity
Krampah Reply
2. A sled plus passenger with total mass 50 kg is pulled 20 m across the snow (0.20) at constant velocity by a force directed 25° above the horizontal. Calculate (a) the work of the applied force, (b) the work of friction, and (c) the total work.
Sahid Reply
you have been hired as an espert witness in a court case involving an automobile accident. the accident involved car A of mass 1500kg which crashed into stationary car B of mass 1100kg. the driver of car A applied his brakes 15 m before he skidded and crashed into car B. after the collision, car A s
Samuel Reply
can someone explain to me, an ignorant high school student, why the trend of the graph doesn't follow the fact that the higher frequency a sound wave is, the more power it is, hence, making me think the phons output would follow this general trend?
Joseph Reply
Nevermind i just realied that the graph is the phons output for a person with normal hearing and not just the phons output of the sound waves power, I should read the entire thing next time
Joseph
Follow up question, does anyone know where I can find a graph that accuretly depicts the actual relative "power" output of sound over its frequency instead of just humans hearing
Joseph
"Generation of electrical energy from sound energy | IEEE Conference Publication | IEEE Xplore" ***ieeexplore.ieee.org/document/7150687?reload=true
Ryan
what's motion
Maurice Reply
what are the types of wave
Maurice
answer
Magreth
progressive wave
Magreth
hello friend how are you
Muhammad Reply
fine, how about you?
Mohammed
hi
Mujahid
A string is 3.00 m long with a mass of 5.00 g. The string is held taut with a tension of 500.00 N applied to the string. A pulse is sent down the string. How long does it take the pulse to travel the 3.00 m of the string?
yasuo Reply
Who can show me the full solution in this problem?
Reofrir Reply
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Geometric methods in structural computational biology. OpenStax CNX. Jun 11, 2007 Download for free at http://cnx.org/content/col10344/1.6
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Geometric methods in structural computational biology' conversation and receive update notifications?

Ask