<< Chapter < Page | Chapter >> Page > |
The level of relatedness of a set of sequences, therefore, directly effects which scoring matrix is most appropriate for aligning the set, whether ornot it is a PAM or a BLOSUM matrix. Comparisons of closely related sequences should use BLOSUM matrices with higher numbers and PAM matrices with lowernumbers. Conversely, BLOSUM matrices with low numbers and PAM matrices with high numbers are preferable for comparisons of distantly related proteins. Nevertheless, asingle matrix may be reasonably efficient over a relatively broad range of evolutionary change. The BLOSUM 62 matrix was chosen as thedefault for BLAST as a result of an analysis by Henikoff and Henikoff wherein BLOSUM 62 detected more distantrelationships in a BLAST search, and produced an alignment of diverged proteins more in agreement with three-dimensional structures, than didthe corresponding PAM 60 matrix. The BLOSUM series does not include any matrices suitable for very short query sequences, so, in these cases,the PAM matrices may be used instead. Berkeley has a Matrix Information website with a provisional table of recommended substitution matrices and gap costs for shorter sequences.
Now, take a look at some scoring matrices. A PAM Matrix website sponsored by Wageningen University, in the Netherlands, allows online computation of PAM matrices. The default value is a PAM 250 matrix; calculate this matrix and look atthe results. This PAM 250 matrix has a built-in gap penalty of -8, as seen in the * column.There are 24 rows and 24 columns. Of course, the first 20 are the amino acids, represented by the one letter code. B represents the case where there isambiguity between aspartate or asparigine, and Z is the case where there is ambiguity between glutamate or glutamine. X represents an unknown, ornonstandard amino acid.
In the PAM 250 matrix, where can the highest scores for each amino acid be found? Why?
Would this be true for any scoring matrix?
What row and column combination gives the highest score? (Specify the score value.)
What is the second highest score? (Specify the score value.)
Why are some scores for amino acid identities higher than others?
Use the back button on the browser, and calculate a PAM 100 matrix. Are the two highest scoring matches the same combination of rowand column as in the PAM 250 matrix? (Discuss with a sentence or two.)
What is the gap penalty?
Explain any differences in the gap penalties of thePAM 250 matrix versus the PAM 100 matrix.
To get an idea how the scoring matrix influences an alignment, perform the following exercise using the Biology Workbench . The Workbench will require a password (it's free), but it will grant entrance immediately upon registration of a password. Enter the site, and scroll down the page until the five menu buttons are visible. The "Session Tools" button allows the naming of a session, so that different jobs in progress can be saved under distinct sessions. Select "Session Tools", then select "Start New Session" and click on "Run" to change the name of "Default Session" to a new name. Once the workbench has been exited, the session will remain. Subsequently, clicking on the dot to the left of the session name under the "Session Tools" menu, and then selecting "Resume Session", will recall the session. The Workbench policy at the time of this writing is that old jobs are deleted only when an account has not been accessed for 6 months. This tutorial will use sequences of hemoglobins (Hbs) from differentorganisms to illustrate the properties of scoring matrices. Choose the "Protein Tools" menu button, then choose the "Ndjinn Multiple Database Search"from the menu at the bottom of the page. Biology Workbench has a large number of databases to search, for this exercise, click in the box to left of the database description to choose the "PDBFINDER" database. Search the PDBFINDER database by typing in the PDB ID codes below into the search box at the top of the page. Import the sequences with the following PDBID codes (use the OR operator between each PDB ID code to search for all of the records in the same search):
Notification Switch
Would you like to follow the 'Bios 533 bioinformatics' conversation and receive update notifications?