<< Chapter < Page | Chapter >> Page > |
Use the back arrow of the browser to return to the Entrez Database web page. There is a menu bar at the top of many NCBI web pages that contains
links to the most commonly used tools and databases, such as PubMed, Entrez, and BLAST. Click on the "Entrez"link at the top of the page. The Entrez cross-database search page should be visible in your browser, now. Here, one can enter a query and click "GO" to searchagainst all databases, or click on a database link for the search page that is specific
to that database.Perform a search using the query string
How many PubMed literature citations and abstracts contain the
character string
How many nucleotide sequences are returned?
How many protein sequences are returned?
How many 3-D macromolecular structure entries are returned?
Click on one or two of the databases that returned items in response to this query.Take a quick look at the information returned as a match. This is an overwhelming amount of information that has been returned in responseto this query. It is difficult to do anything with this much information. For this reason, a good search strategy is required to limit the search ascleverly as possible in an attempt to obtain mostly records of interest, with very little excess information, without restricting the search so much that it islikely to miss important records.
There are many different ways to limit a search query. To illustrate one approach
available in Entrez,from the cross-database search page, click on the Nucleotide Database link.
Notice the menu just under the query box, and click on the link entitled "Limits".Under "Limited to:", select "organism". On the pull-down menus, change the limits
from "molecule" to "Genomic DNA/RNA", change "segmented sequences" to "show only masterof set", and change "only from" to "GenBank". This limits the search from returning
records from any type of molecule, including protein, ESTs, etc., to only recordsof submitted Genomic DNA or RNA sequences. It furthermore limits the sequences returned
to only master sequences of any sets, and it only searches the GenBank database forrecords. Using
Now, how many nucleotide sequences are returned?
How does this compare to the number of nucleotide sequences returned in the cross-database search?
Hopefully, this has illustrated that a general cross-database search is best used
when there is very little information available related to the query, and soit is desirable to find all pieces of related data. However, when lots of data
is available related to the query, it it desirable to limit your items returned.Using the "Limits" function in Entrez is not always the best way to limit a query,
though. Perhaps the area of interest happens to be genes that help confer drugresistance to
How many items (sequence records) are returned?
Look at the list of results. The numbers at the head of each result are called access codes. Click on the access code of one of these records. The left column of the record contains terms that are referredto as "identifiers". The identifiers in any database are defined terms that indicate the record section and the type of data included in thatsection. Scroll down to the section entitled "Features". Two common identifiers foundin this section are "gene" and "CDS" listings. The CDS tag identifies "coding DNAsequences", meaning these sequences have been determined (most often by bioinformatics and not experimental methods) to encode proteins, and arethus distinguished from the noncoding regions that make up a substantial amount of the DNA in the human genome. A good primer on the basiccharacteristics of DNA, including the differences between coding versus noncoding sequences, can be found on the Dolan DNA Learning Center web page (2).Scroll through the results, and notice that there are links embedded in this record. These links connect this record to other databases, as illustrated in the connectivity diagram discussed earlier in this module. So, even though thissearch was performed over the nucleotide database, the result may contain a link that takes us to a record in the protein database. Find a record that contains a "gene" link in the Features section of the record, and click on this link.In the new record, there should be a sequence of capital letters at the bottom of the CDS section.
What does this sequence represent?
There is an additional sequence in lower case letters at the bottom of this record.
What type of sequence is represented by the lower case letters?
If these questions regarding sequences have been difficult to answer, please review the genetic code , as this is prerequisite information for this course.
Try your own search. Scroll back to the top of the web page and this time next to the Search command, choose PubMed from the menu. Pick any life sciencestopic that interests you for your query. Attempt a first query with a general topic, such as protein kinase or diabetes.
What type of results does PubMed return from a query?
Note how many items in total(not just on the first page) were returned. Make your query topic related to your original choice, but more specific.For example, change 'protein kinase' to 'protein kinase C'.
How much did this reduce the number of items returned?
This module is intendedas an introduction to performing searches of the NCBI databases using Entrez. If you are unfamiliar with Entrez, please feel free to return to this moduleas a resource for getting started on NCBI searches.
Notification Switch
Would you like to follow the 'Bios 533 bioinformatics' conversation and receive update notifications?