<< Chapter < Page | Chapter >> Page > |
The drosophila (fruit fly) genome is 180Mb in size and contains approx. 13,600 genes (Genie and Genescan were used to predict genes). The somewhat smaller C. elegans genome actually contains more genes than the Drosophila genome, although the functional diversity between the two species appears to be very similar. The Drosophila genome was published in March of 2000 (Science, 24 March 2000), a few years after the C. elegans genome was initially released. The genome contains 3 autosomal chromosomes (numbered 2-4), and one X chromosome. Each drosophila gene contains on average 4 exons of approx. 750bp a piece. Intron size is highly variable and can range from 40bp to more than 70kb. Introns and exons are both predicted to occupy around 20Mb of sequence.
Sequencing of the human genome was first formally proposed in 1985, but at the time the idea was met with mixed reactions in the scientific community. Then in1990 the Human Genome Project (HGP), under the direction by the N.I.H. and the Dept. of Energy, launched a 15-year, $3 billion plan for sequencing the complete human genome. Their progress was slow however and the HGP did not appear to be on pace to finish by the projected date in 2005. Half way through their planned time period, in early 1998, the HGP had sequenced less than 5% of the entire genome.
Then, in the same year that the HGP was reevaluating its progress, Celera, headed by Craig Venter, announced its intention to sequence the entire human genome over a three year period. After cutting their teeth on the Drosophila genome (which was done in collaboration with Gerald Rubin and the Berkley Drosophila Genome Project), Celera initiated the whole-genome shotgun sequencing of the human genome on September 8th, 1999. Less than a year later, on 17 June 2000, the first draft of the genome was completed. Today 99.9% of the human genome is 'finished', meaning less than 1 bp error per 10,000 base pairs.
The method Celera used, termed shotgun sequencing, is conceptually straightforward, but requires large amounts of computer processing power to complete. The protocol (in great oversimplification) is as follows: 1) cut up the genomic DNA into small pieces of known and regular size, 2) clone the pieces of genomic DNA into plasmids for purification and amplification purposes, 3) randomly sequence the DNA fragments from the plasmids while screening the results for contamination, 4) and then load the whole sequenced mess into the computer and let the computer sort it all out. The computer essentially plays a giant matching game building up larger and larger overlapping sequences until the whole genome is finally laid out in entirety. The process is, of course, not nearly this simple. One major complication worth mentioning is that the human genome is particularly replete with repeat sequences that could easily create numerous misleading matches. Computing the set of all overlaps required approx. 10,000 CPU hours on a suite of four-processor Alpha SMPs with 4 gigabytes of RAM (4-5 days in elapsed time using 40 such machines).
Notification Switch
Would you like to follow the 'Genefinding' conversation and receive update notifications?