<< Chapter < Page Chapter >> Page >

Computational genefinding

What is computational genefinding? Simply put, it is the developmentof computational procedures to locate protein coding regions in unprocessed genomic DNA sequence data. In reality however,pinpointing the mere location of a gene is part of a much larger challenge. The eukaryotic gene is a complicated and highly studiedbeast composed of a multitude of small coding regions and regulatory elements hidden amidst tens of thousands of base pairs ofintronic and non-signal DNA. In order to accurately predict gene locations we must first understand how the different functionalcomponents interact to create the dynamic and complex phenomena we have come to understand as 'a gene'.

Thus genefinding is a little bit of a misnomer: in order to find genes we must first understand the content and structure of the signal thegenes present to the cell's genetic machinery, and in doing this we must answer much broader questions than the seemingly facile question,"Where are the genes?" The goal of genefinding then is not simple gene prediction, but accurate modeling of the signal genes present tothe cell. Furthermore, because such information does not exist in a vacuum separate from it's interpretation, implicit in the assumptionof the ability to model the genetic signal is a furthering of our capacity to understand the deciphering of the genetic signal and ourunderstanding of the inner workings of the cell itself.

Genefinding in prokaryotic genomes

There are two basic approaches to gene finding: ab-initio and comparative. Ab-initio methods use statistical properties of the given genome, while comparative methods use annotations from previously analyzed genomes as an additional input. We will begin our discussion of gene finding with ab-initio methods as applied to simpler prokaryotic genomes. Examples of such genomes include H. influenzae (the influenza virus). Over 70% of H. influenzae codes for proteins. Genes in prokaryota are contiguous stretches of base pairs with no intronic breaks. There are untranslated regions (UTRs) that flank both ends of a gene: the 5' (5-prime) and 3' (3-prime end). Genes are directional -- they are read from the 5' to the 3' end. There are genes on both strands of the DNA double helix. Each gene starts with the amino acid methionine, specified by the three letter codon ATG. ATG is called the start codon. The end of a gene is signalled by one of three stop codons (TAA, TAG, TGA). The start codon signals the ribosomal machinery to start translating the bases in composites of three into amino acids until the stop codon is reached. Gene finding in prokaryota reduces to the problem of finding stretches of the genome with a start codon and a stop codons with no intervening stop codons. Such a stretch is called an open reading frame or ORF.

Given a sequence from {A,C,G,T}*, an open reading frame (ORF) is any subsequence that starts with the codon ATG and ends with a stop codon (TAA, TAG, TGA) with no stop codons in between. ORF finding algorithms are based on the following simple idea.Since coding regions are terminated by stopcodons, one needs to to look for long stretches of bases without a stop codon. Once a stop codon is found,we work backward to find the start codon corresponding to the gene. Why do we look for long stretches without stop codons?If nucleotide bases were drawn uniformly at random, then a stop codon is expectedonce every 64/3 (about 21)codons, or about 63 base pairs. By selecting an appropriate length threshold t (typically greater than 210 bases or 70 amino acids), we reduce the likelihood of picking a random sequence with a stop codon rather than an actual coding region. Modifications to this basic algorithm to handle very short genes andoverlapping genes have been developed. The most successful method for finding coding regions in prokaryotic genomes is one based on interpolated Markov models emboded in the program GLIMMER. It is available here . A 2007 Bioinformatics paper details how to use this tool.

Questions & Answers

what is microbiology
Agebe Reply
What is a cell
Odelana Reply
what is cell
Mohammed
how does Neisseria cause meningitis
Nyibol Reply
what is microbiologist
Muhammad Reply
what is errata
Muhammad
is the branch of biology that deals with the study of microorganisms.
Ntefuni Reply
What is microbiology
Mercy Reply
studies of microbes
Louisiaste
when we takee the specimen which lumbar,spin,
Ziyad Reply
How bacteria create energy to survive?
Muhamad Reply
Bacteria doesn't produce energy they are dependent upon their substrate in case of lack of nutrients they are able to make spores which helps them to sustain in harsh environments
_Adnan
But not all bacteria make spores, l mean Eukaryotic cells have Mitochondria which acts as powerhouse for them, since bacteria don't have it, what is the substitution for it?
Muhamad
they make spores
Louisiaste
what is sporadic nd endemic, epidemic
Aminu Reply
the significance of food webs for disease transmission
Abreham
food webs brings about an infection as an individual depends on number of diseased foods or carriers dully.
Mark
explain assimilatory nitrate reduction
Esinniobiwa Reply
Assimilatory nitrate reduction is a process that occurs in some microorganisms, such as bacteria and archaea, in which nitrate (NO3-) is reduced to nitrite (NO2-), and then further reduced to ammonia (NH3).
Elkana
This process is called assimilatory nitrate reduction because the nitrogen that is produced is incorporated in the cells of microorganisms where it can be used in the synthesis of amino acids and other nitrogen products
Elkana
Examples of thermophilic organisms
Shu Reply
Give Examples of thermophilic organisms
Shu
advantages of normal Flora to the host
Micheal Reply
Prevent foreign microbes to the host
Abubakar
they provide healthier benefits to their hosts
ayesha
They are friends to host only when Host immune system is strong and become enemies when the host immune system is weakened . very bad relationship!
Mark
what is cell
faisal Reply
cell is the smallest unit of life
Fauziya
cell is the smallest unit of life
Akanni
ok
Innocent
cell is the structural and functional unit of life
Hasan
is the fundamental units of Life
Musa
what are emergency diseases
Micheal Reply
There are nothing like emergency disease but there are some common medical emergency which can occur simultaneously like Bleeding,heart attack,Breathing difficulties,severe pain heart stock.Hope you will get my point .Have a nice day ❣️
_Adnan
define infection ,prevention and control
Innocent
I think infection prevention and control is the avoidance of all things we do that gives out break of infections and promotion of health practices that promote life
Lubega
Heyy Lubega hussein where are u from?
_Adnan
en français
Adama
which site have a normal flora
ESTHER Reply
Many sites of the body have it Skin Nasal cavity Oral cavity Gastro intestinal tract
Safaa
skin
Asiina
skin,Oral,Nasal,GIt
Sadik
How can Commensal can Bacteria change into pathogen?
Sadik
How can Commensal Bacteria change into pathogen?
Sadik
all
Tesfaye
by fussion
Asiina
what are the advantages of normal Flora to the host
Micheal
what are the ways of control and prevention of nosocomial infection in the hospital
Micheal
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Statistical machine learning for computational biology. OpenStax CNX. Oct 14, 2007 Download for free at http://cnx.org/content/col10455/1.2
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Statistical machine learning for computational biology' conversation and receive update notifications?

Ask