<< Chapter < Page | Chapter >> Page > |
When the ClustalW results appear, first scroll down to the cladogram and observe which of these sequences are most closely related versus the moredistant sequences. Notice there are three separate clusters of branches descending from the root.The two largest clusters are separated as a direct result of a structural characteristic of hemoglobins.
What do each of these two clusters represent? (If the answer is not immediately clear, read this description of Hemoglobin from the University of Brescia's on-line Biochemistry Course.)
According to this cladogram, what is the sequence that is most closely related to human hemoglobin, ID code 1QSI?
According to this cladogram, what is the sequence that is most closely related to the
According to this cladogram, what sequence is most closely related to the spot croaker Hb, ID code 1SPG?
It is not as clear from the cladogram which sequences are the most distantly related. However, scroll down past the cladogram to viewthe ClustalW pairwise alignment scores.
Which two sequences yield thelowest pairwise alignment score?
At the very bottom of the alignment page, select "Import Alignments", to save this information for later reference,should that be necessary. The imported alignments can only be viewed through the "Alignment Tools" menu.
Apply the information elucidated by the multiple sequence alignment to test the impact of varying the scoring matrices in pairwise alignments.Return to "Protein Tools". Start with two sequences that are known to be closely related, the human Hb chain B, 1QSI_B, and the horse Hb chain B,1IWH_B, by checking the box to the left of each of their codes. Choose "LALIGN" from the pull-down menu at the bottom of the page to comparetwo protein sequences to each other with BLAST. When the LALIGN page appears, next to select scoring matrix, choose PAM250 and run the alignment.
What is the (a) score of the alignment, (b) the length of the alignment, and(c) the percent identity?
Now, return to "Protein Tools" and run LALIGN again with the same two sequences, 1QSI_B and 1IWH_B, except choose the "PAM120" matrixthis time.
What is the (a) score of the alignment, (b) the length of the alignment, and(c) the percent identity?
(a) Which scoring matrix yielded the highest score for the alignment, and why is this matrix the best choice for this alignment? (b) List any regions where the two alignments differ.
Return to "Protein Tools", this time selecting the 1HV4_A and the 1TIN_B sequences, by checking the box next to their codes. Again, choose "LALIGN",and perform an alignment with the default PAM250 matrix.
What is the (a) score of the alignment,(b) the length of the alignment, and (c) the percent identity?
Run LALIGN again on the same two sequences, using thePAM120 matrix. What is the (a) score of the alignment, (b) the length of the alignment, and(c) the percent identity?
(a) Which scoring matrix yielded the highest score for the alignment,and why is this matrix the best choice for this alignment? (b) List any regions where the two alignments differ.
Do the two different matrices always calculatethe same value for percent identity when the same 2 sequences are beingcompared using each matrix? Why or why not?
Most bioinformatics tools available on the web have selected default scoring matrices that are based on a relatively exhaustive analysisof which scoring schemes work best over a wide range of query sequence characteristics.However, it is important to not only know which scoring matrix is used for a given alignment, but to consider the appropriateness of thedefault matrix for a given query as well. It is a recurring theme of bioinformatics that these computational toolsshould not be treated as "black boxes" where one can ignore the internal workings of the software,but instead require thoughtful interaction on the part of the user.
Notification Switch
Would you like to follow the 'Bios 533 bioinformatics' conversation and receive update notifications?