<< Chapter < Page Chapter >> Page >

As the powers of computers has developed and matured, the manner in which we use them has correspondingly evolved. Initially computers insinuated themselves into our lives because of their ability to quickly perform large numbers of simple calculations, and because they could be used to efficiently store large amounts of information. Used as such, they were essentially glorified calculator-filing-cabinets.

Watson and crick

Today however, we can go well beyond this simple understanding of our relationship with computers as experimental tools. This changing dynamic is especially evident in, and necessary to, the emergent field of bioinformatics where successful realization of the presented challenges requires both the computer's ability to analyze large and complex data sets; and the human ability to initially generate the data as well as to interpret the computer analysis of the data. Computers should be viewed as tools to extend our vision into the abstract realms of data analysis, and this improved sight should improve our efficiency in the laboratory.

This type of symbiosis is commonplace today. An example scenario might be as follows: a researcher isolates a novel gene of interest and has it sent off to be sequenced. When the researcher receives the sequence in the mail a few days later, the researcher then loads the sequence into the BLAST search engine looking for known homologues. If a homologue exists, either in the same species or in another, related, species, this information can be used to predict the possible functions the gene might have. Alternatively, the researcher might want to isolate where the gene resides in the genomic DNA. Before whole-genome sequences were available, this was a very laborious and difficult process involving time intensive restriction mapping techniques. Today, the process has been greatly simplified. To find the gene's location in the genomic DNA the researcher would almost certainly begin with a BLAST search of the organism's genome (if available, or a closely related organism if not). The search would return a list of candidate sequences, and their locations in the genome, that could then be checked experimentally for identity with the gene of interest. Furthermore, a successful BLAST search might not only reveals the exact location of the gene of interest, but also any closely related genes as well (the latter being a great advantage of genomic searching versus earlier experimental gene isolation techniques).

When compared to prior techniques, a successful BLAST search is highly efficient and also returns a much greater wealth of data. Unfortunately, the BLAST search is not the end of the process. The results of the search should be viewed as candidates that must be experimentally verified in the lab before any final conclusions can be drawn about their true nature.

Maurice wilkins

Another specific example of this type of human/computer interface can be found in the analysis of the experimental finding that 3.3% of the human genome aligns to multiple regions of the mouse genome in whole-genome BLASTZ alignments (Birney et. al. 2003). The implication of this is that outside, higher-order, human knowledge must be brought to bear on the problem of identifying the most significant alignments when multiple alignments are found. Another example that demonstrates the necessity of meaningful interaction between computer analysis and human understanding is the observation that only one third of the genome under purifying selection actually codes for protein expression (Flicek et. al. 2003). This result comes from a comparative alignment of human-mouse complete genomic sequences. The most basic implication of this is that any attempt at gene-prediction via whole-genome alignment is going to generate large numbers of false-positives because of conserved non-coding and non-regulatory regions.

In these examples we can see how experimental evidence leads to computer analysis which is then used to direct subsequent experiments. The cyclical nature of our interaction with the two search spaces, the physical and the informational, is becoming increasingly apparent as the two disciplines mature. Human exploration of the wet and chaotic physical world should direct and be directed by the computer-facilitated human exploration of the ethereal information space, which was itself generated by prior experimental insight and abstract thought. In reality, both investigative systems are indirect means of increasing our understanding of the same physical phenomena as validated by the reproducible utility of gained information when applied to either / both systems.

The inventors of the transistor

Dr. John Bardeen, Dr. Walter Brattain, and Dr. William Shockley discovered the transistor effect and developed the first device in December, 1947, while the three were members of the technical staff at Bell Laboratories in Murray Hill, NJ. They were awarded the Nobel Prize in physics in 1956.

So what, then, are the goals of genefinding as a subset of bioinformatics? Simply put, the goal of genefinding is to locate protein coding regions in unprocessed genomic DNA sequence data. In reality however, pinpointing the mere location of a gene is part of a much larger challenge. The eukaryotic gene is a complicated and highly studied beast composed of a variable multitude of small coding regions and regulatory elements hidden amidst tens of thousands of base pairs of intronic and non-signal DNA. In order to accurately predict gene locations we must first understand how the different functional components interact to create the dynamic and complex phenomena we have come to understand as 'a gene'.

Thus genefinding is something of a misleading misnomer: in order to find genes we must first understand the content and structure of the signal the genes present to the cell's genetic machinery, and in doing this we must answer much broader questions than the seemingly facile question, "Where are the genes?" The goal of genefinding then is not simple gene prediction, but accurate modeling of the signal genes present to the cell. Furthermore, because such information does not exist in a vacuum separate from it's interpretation, implicit in the assumption of the ability to model the genetic signal is a furthering of our capacity to understand the deciphering of the genetic signal and our understanding of the inner workings of the cell itself.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Genefinding. OpenStax CNX. Jun 17, 2003 Download for free at http://cnx.org/content/col10205/1.1
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Genefinding' conversation and receive update notifications?

Ask