<< Chapter < Page | Chapter >> Page > |
It is important to understand the processes behind the genetic variables of interest. The most important concept is known as thecentral dogma of molecular biology. This describes that process of by which genetic information is interpreted and eventually proteins are created that arerequired for all life. We begin with double stranded DNA. DNA is composed of a long chain of two different base pairs (4 bases). The pairs are Adenine andThymine, and Guanine and Cytosine. Thus at any point on the DNA strand you can have a choice of 4 different bases. Some sections of the DNA are genes. Eachgene in the DNA is unzipped into one strand and then transcribed onto which is known as messenger RNA. The RNA can then be read and translated into protein.I.E. We progress as from DNA to mRNA to Protein.
Two genetic variables of interest that are involved in the central dogma are estimated copy number and gene expression. Theestimated copy number of a gene gives the estimated number of copies of a specific gene exist with in a samples genome. Normally, we would expect to see acopy number of 2 but often we see copy numbers as low as 0 and as high as 5 or 6. Most of the time this does not harm the particular patient and we expect tosee a certain amount of copy number variation within any person, but it can sometimes be correlated with the incidence of cancer.
Gene expression is a variable that indicates the estimates amount of mRNA that one observes in a sample. It is often difficult todirectly measure the amount of a given protein within a sample, but we often witness a correlation between the amount of mRNA in a sample and the amount ofprotein that is created.
Our data is comprised of 89 samples which have estimated copy number data. 41 of these samples also have gene expression data.The goals of this project is to examine the copy number and gene expression data of these sample and see if there are certain sections in the genome that haveare consistently over or under expressed and possibly have an abnormal copy number. The next steps would be to examine the known processes of these genes tosee if they are incorporated in any cancer-related functions such as cell growth.
First, the raw copy number data was imported into an application called Illumina Genome Studio. Physically, the data is obtained bymeasuring how bright a certain chemical fluoresces when mixed with the DNA. Genome Studio then estimates the copy numbers based on this data. Genome Studiowas then used to generate a frequency plot which displays how frequently each gene was amplified or deleted,
In addition, to confirm our findings an R script was written that also generates a similar plot,
These findings were also confirmed using a program called GISTIC which is genetic analysis software which is part of the BroadInstitutes Gene Pattern Server.
The next step in our analysis of the copy number data involved examining any genes that had extreme amplifications or extremedeletions. I.E. any genes in which the estimated copy number was less than .5 or greater than 5. A script was written in the R statistical package in order todetect the frequency with which the genes had major amplifications or deletions.
The results of this research will be published in a future publication. It would be necessary to get more samples, especially oneswith both copy number and gene expression data to create a full gene expression pattern. [link]
Dr. Rudy Guerra, Matthew Burnstein, Dr. Chris Man, Dr. Ching Lau, Alexander Yu, Powell-Brown Lab, Rice university, Texas Children'sHospital
This Connexions module describes work conducted as part of RiceUniversity's VIGRE program, supported by National Science Foundation grant DMS–0739420.
Notification Switch
Would you like to follow the 'The art of the pfug' conversation and receive update notifications?