<< Chapter < Page Chapter >> Page >

Use the back arrow of the browser to return to the Entrez Database web page. There is a menu bar at the top of many NCBI web pages that contains links to the most commonly used tools and databases, such as PubMed, Entrez, and BLAST. Click on the "Entrez"link at the top of the page. The Entrez cross-database search page should be visible in your browser, now. Here, one can enter a query and click "GO" to searchagainst all databases, or click on a database link for the search page that is specific to that database.Perform a search using the query string Mycobacterium tuberculosis , and click "GO".

How many PubMed literature citations and abstracts contain the character string Mycobacterium tuberculosis ?

Got questions? Get instant answers now!

How many nucleotide sequences are returned?

Got questions? Get instant answers now!

How many protein sequences are returned?

Got questions? Get instant answers now!

How many 3-D macromolecular structure entries are returned?

Got questions? Get instant answers now!

Click on one or two of the databases that returned items in response to this query.Take a quick look at the information returned as a match. This is an overwhelming amount of information that has been returned in responseto this query. It is difficult to do anything with this much information. For this reason, a good search strategy is required to limit the search ascleverly as possible in an attempt to obtain mostly records of interest, with very little excess information, without restricting the search so much that it islikely to miss important records.

There are many different ways to limit a search query. To illustrate one approach available in Entrez,from the cross-database search page, click on the Nucleotide Database link. Notice the menu just under the query box, and click on the link entitled "Limits".Under "Limited to:", select "organism". On the pull-down menus, change the limits from "molecule" to "Genomic DNA/RNA", change "segmented sequences" to "show only masterof set", and change "only from" to "GenBank". This limits the search from returning records from any type of molecule, including protein, ESTs, etc., to only recordsof submitted Genomic DNA or RNA sequences. It furthermore limits the sequences returned to only master sequences of any sets, and it only searches the GenBank database forrecords. Using Mycobacterium tuberculosis as the query string again, perform the search with these limits.

Now, how many nucleotide sequences are returned?

Got questions? Get instant answers now!

How does this compare to the number of nucleotide sequences returned in the cross-database search?

Got questions? Get instant answers now!

Hopefully, this has illustrated that a general cross-database search is best used when there is very little information available related to the query, and soit is desirable to find all pieces of related data. However, when lots of data is available related to the query, it it desirable to limit your items returned.Using the "Limits" function in Entrez is not always the best way to limit a query, though. Perhaps the area of interest happens to be genes that help confer drugresistance to Mycobacterium tuberculosis . Deselect the previously set limits by clicking on the check mark to the left so that it disappears. Now,search "nucleotide" using the query string " Mycobacterium tuberculosis drug resistance".

How many items (sequence records) are returned?

Got questions? Get instant answers now!

Look at the list of results. The numbers at the head of each result are called access codes. Click on the access code of one of these records. The left column of the record contains terms that are referredto as "identifiers". The identifiers in any database are defined terms that indicate the record section and the type of data included in thatsection. Scroll down to the section entitled "Features". Two common identifiers foundin this section are "gene" and "CDS" listings. The CDS tag identifies "coding DNAsequences", meaning these sequences have been determined (most often by bioinformatics and not experimental methods) to encode proteins, and arethus distinguished from the noncoding regions that make up a substantial amount of the DNA in the human genome. A good primer on the basiccharacteristics of DNA, including the differences between coding versus noncoding sequences, can be found on the Dolan DNA Learning Center web page (2).Scroll through the results, and notice that there are links embedded in this record. These links connect this record to other databases, as illustrated in the connectivity diagram discussed earlier in this module. So, even though thissearch was performed over the nucleotide database, the result may contain a link that takes us to a record in the protein database. Find a record that contains a "gene" link in the Features section of the record, and click on this link.In the new record, there should be a sequence of capital letters at the bottom of the CDS section.

What does this sequence represent?

Got questions? Get instant answers now!

There is an additional sequence in lower case letters at the bottom of this record.

What type of sequence is represented by the lower case letters?

Got questions? Get instant answers now!

If these questions regarding sequences have been difficult to answer, please review the genetic code , as this is prerequisite information for this course.

Try your own search. Scroll back to the top of the web page and this time next to the Search command, choose PubMed from the menu. Pick any life sciencestopic that interests you for your query. Attempt a first query with a general topic, such as protein kinase or diabetes.

What type of results does PubMed return from a query?

Got questions? Get instant answers now!

Note how many items in total(not just on the first page) were returned. Make your query topic related to your original choice, but more specific.For example, change 'protein kinase' to 'protein kinase C'.

How much did this reduce the number of items returned?

Got questions? Get instant answers now!

This module is intendedas an introduction to performing searches of the NCBI databases using Entrez. If you are unfamiliar with Entrez, please feel free to return to this moduleas a resource for getting started on NCBI searches.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Bios 533 bioinformatics. OpenStax CNX. Sep 24, 2008 Download for free at http://cnx.org/content/col10152/1.16
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Bios 533 bioinformatics' conversation and receive update notifications?

Ask