<< Chapter < Page Chapter >> Page >
This module explains the general characteristics of the scoring matrices used to score sequence alignments. Guidelines are given to help the student choose which scoring scheme to choose for a given alignment, according to the characteristics of the sequences to be aligned.

In bioinformatics, scoring matrices for computing alignment scores areoften based on observed substitution rates, derived from the substitution frequencies seen in multiple alignments of sequences. Every possible identityand substitution is assigned a score based on the observed frequencies of such occurences in alignments of related proteins. The score is calculated from thefrequency of occurrence of a match of the two individual amino acids in evolutionarily related sequences, and provides a measure of a chancealignment of the two amino acids. This score will also reflect the frequency that a particular amino acid occurs in nature, as some amino acids are moreabundant than others. Higher scores indicate that the probability that those two amino acids aligned by chance is very small, and lower scores indicate ahigh probability the two amino acids aligned by chance, and are evolutionarily unrelated. Thus, identities are assigned the most positive scores, frequentlyobserved substitutions also receive positive scores, but matches that are unlikely to have been a result of evolution, and are more likely indicativeof unrelatedness at that position, are given negative scores. Matrices with scoring schemes based on observed substitution rates are superior to simpleidentity scores, or scores based solely on sidechain moiety similarity. The two most commonly used types of scoring matrices are the PAM matrices and the BLOSUM matrices .

PAM (Percentage of Acceptable point Mutations per 10 8 years) matrices are based onglobal alignments of closely related proteins. The PAM 1 is the matrix calculated from comparisons of sequences with no morethan 1% divergence. Scores are derived from a mutation probability matrix where each element gives the probability of the amino acid in column Xmutating to the amino acid in row Y after a particular evolutionary time, for example after 1 PAM, or 1% divergence.A PAM matrix is specific for a particular evolutionary distance, but may be used to generate matrices for greater evolutionary distances by multiplyingit repeatedly by itself. However, at large evolutionary distances the information present in the matrix is essentially degenerated. It is rarethat a PAM matrix would be used for an evolutionary distance any greater than 256 PAMs.

Whereas the PAM matrices have been developed from global alignments, the BLOSUM (BLOcks SUbstitution Matrix) matrices are based on localmultiple alignments of more distantly related sequences. For instance, BLOSUM 62, the default matrix in BLAST, is a matrix calculated fromcomparisons of sequences with no less than 62% identity. Unlike PAM matrices, new BLOSUM matrices are never extrapolated from existing BLOSUM matrices,but are always based on local multiple alignments. So, the BLOSUM 80 matrix would be derived from a set of sequences having 80% sequence identity.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Bios 533 bioinformatics. OpenStax CNX. Sep 24, 2008 Download for free at http://cnx.org/content/col10152/1.16
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Bios 533 bioinformatics' conversation and receive update notifications?

Ask