<< Chapter < Page | Chapter >> Page > |
A motif S is a set of m points , ..., in three dimensions, whose coordinates are taken from backbone and side-chain atoms. Each motif point in the motif has an associated rank , a measure of the functional significance of the motif point. Each also has a set of alternate amino acid labels in {GLY, ALA, ...}, which represent residues this amino acid has mutated to during evolution. Labels permit our motifs to simultaneously represent many homologous active sites with slight mutations, not just a single active site. In this paper, we obtain labels and ranks using Evolutionary Trace (ET) , .
Other motifs have been designed with other approaches. Motifs have been composed of points on the Connolly surface representing electrostatic potentials , of hinge-bending sets of points in space , of sets of "pseudo-centers" representing protein-ligand interactions , or of points taken from atom coordinates with evolutionary data , , to name a few. Depending on how motif points are defined, they have different labels associated with them and these labels need to be taken into account when comparing motifs.
Local structure comparison algorithms mainly target the biological problem of Protein Function Prediction. Ideally, a function prediction pipeline should address the following subproblems:
Suppose for a moment that we have hand-designed motifs. In the next section, we concentrate on an efficient algorithm for local structure comparison.
Many such algorithms exist, but differ fundamentally in that they are optimized for comparing different types of motifs. There are algorithms for comparing graph-based motifs , algorithms for finding catalytic sites , and the seminal Geometric Hashing framework which can search for many types of motifs, including motifs based on atom position , points on Connolly face centers , catalytic triads , and flexible protein models . Match Augmentation (MA) is described below.
Rank information prioritizes motif data and MA was designed in a prioritized fashion, where correspondences with higher ranked points are identified first. MA is composed of two parts: Seed Matching and Augmentation . The purpose of Seed Matching is to identify a match for the seed S' = { , , }, the three highest ranked motif points. The k lowest lRMSD seed matches are passed to Augmentation to be iteratively expanded into matches for the remaining motif points, in descending rank order. Augmentation outputs the match with smallest lRMSD.
Seed Matching identifies all sets of 3 target points T' = { , , } which fulfill our matching criteria with the highest ranked 3 motif points, S'= { , , }. In this stage, we represent the target as a geometric graph with colored edges. There are exactly three unordered pairs of points in S', and we name them red, blue and green. In the target, if any pair of target points , fulfills our first two criteria with either red, blue or green, we draw a corresponding red blue or green edge between , in the target. Once we have processed all pairs of target points, we find all three-colored triangles in T. These are the Seed Matches, a set of three-point correlations to S' that we sort by lRMSD and pass to Augmentation.
Notification Switch
Would you like to follow the 'Geometric methods in structural computational biology' conversation and receive update notifications?