<< Chapter < Page Chapter >> Page >
This module is an introduction to computational genefinding. It describes the biological problem of annotating genomes into functional components. It begins with a brief description of the human genome, the central dogma of modern molecular biology, and introduces the complexities of detecting genes in eukaryotes. Ab initio methods and comparative techniques based on Hidden Markov Models are presented. GENSCAN, a classical ab initio gene finder is covered in detail. The challenges faced by comparative gene finders are described. The module ends with a discussion of how gene finders are evaluated and what the current state of the art is.

The human genome: chromosomes and genes

In 1953 Watson and Crick unlocked the structure of the DNA molecule and set into motion the modern study of genetics.This advance allowed our study of life to transcend the wet realm of proteins, cells, organelles, ions, andlipids, and move up into more abstract methods of analysis. By discovering the basic structure of DNA we had received ourfirst glance into the information-based realm locked inside the genetic code.

The human genome contains 3 billion chemical nucleotide bases (A,C, T and G). About 30,000 genes are estimated to be in the humangenome. The human genome has physical three-dimensional structure. The genome is 6 feet (2 meters) in length and is packedin the nucleus of our cells into a structure which is only 0.0004 inches across (the head of a pin). The genome is divided among 24chromosomes (22 pairs of autosomes and one pair of sex chromosomes (X and Y)), and that genes lie on specific chromosomes. Humanchromosomes are arranged according to size with Chromosome 1 being the largest, and the Y chromosome being the smallest. Matt Ridley'sfascinating book Genome gives a great introduction to our chromosomes and the genes they contain. Chromosome 1 is believed to have 2968 genes, while the Ychromosome has 231 genes. To learn more about chromosomes, visit GeneMap99 , a site maintained by the NCBI. Here is a diagrammatic representationof the 24 chromosomes.

Human chromosomes

Representation of the 24 chromosomes of the human male.
Here is what a chromosome looks like under an electron microscope.

Human chromosomes

Human Chromosome 12 under an electron microsocope.

The average human gene contains about 3000 bases. Sizes of human genes vary greatly. The largest known humangene is dystrophin (a muscle protein) at 2.4 million bases. The smallest genes are a little over a hundred base pairs long. Lessthan 2% of the genome codes for proteins. Repeated sequences that are not involved in coding for proteins (sometimes called "junkDNA") make up at least 50% of the human genome. These repetitive sequences play an important role in chromosome structure anddynamics. Over time, these repeats are believed to reshape the genome by rearranging it, creating entirely new genes, andmodifying and reshuffling genes. Surprisingly, genes are not distributed uniformly through the human genome. Genes appear to beconcentrated in sections of the genome with high GC content, with vast areas of non-coding DNA in between. There are long stretchesof C and G repeats adjacent to gene-rich areas. These CpG islands are believed to regulate gene activity, and they serve as markersfor gene-rich locations on the genome. We do not yet know the function of over 50% of the discovered genes. A great site tolearn more about DNA is the DNAi site maintained by the HHMI.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Statistical machine learning for computational biology. OpenStax CNX. Oct 14, 2007 Download for free at http://cnx.org/content/col10455/1.2
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Statistical machine learning for computational biology' conversation and receive update notifications?

Ask