Skip to content Skip to navigation Skip to collection information

OpenStax-CNX

You are here: Home » Content » Bios 533 Bioinformatics » Entrez

Navigation

Table of Contents

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • Rice Digital Scholarship

    This collection is included in aLens by: Digital Scholarship at Rice University

    Click the "Rice Digital Scholarship" link to see all content affiliated with them.

Recently Viewed

This feature requires Javascript to be enabled.
 

Entrez

Module by: Susan Cates. E-mail the author

Summary: This module is an introduction to performing searches of the NCBI databases using Entrez, the NCBI web-based search and retrieval tool for integrated search results from multiple databases.

Entrez (1) is a search and retrieval tool developed by NCBI that is capable of searching multiple NCBI databases with just one query. Entrez returns search results that can include a combination of many types of data on the query, such as nucleotide sequences, protein sequences, macromolecular structures, and related articles in the literature. Prior to the creation of Entrez, an individual might have to place one query to a nucleotide database to find a nucleotide sequence, submit another query to a structural database to find the published structure of the gene product, and submit a final query to a literature database to find citations for journal articles on the query topic. NCBI recognized the time and effort that could be saved by a tool that could cross-link these databases and integrate all information related to a given query subject into one report. View the Entrez Database page. This module contains a few problem questions, for use in a computer lab setting. The lab instructor may require that you supply answers to these questions as an indication that you have completed the module.

The Entrez Nucleotides database includes sequences from GenBank, RefSeq, and PDB. GenBank is the National Institutes of Health (NIH) genetic sequence database. GenBank, the DNA DataBank of Japan (DDBJ) and the European Molecular Biology Laboratory (EMBL) comprise the International Nucleotide Sequence Database Collaboration. These three organizations exchange data on a daily basis. The number of bases in the Entrez Nucleotides database currently grows at an exponential rate. Click on the Nucleotide link listed under the heading "Nucleotide Databases".

Exercise 1

What is the number of bases stored in the Entrez nucleotide database, as of the last report?

Use the back arrow of the browser to return to the Entrez Database web page. Locate the MMDB (Molecular Modeling DataBase), one of NCBI's structure databases and click on the link to read about it. MMDB is a subset of three-dimensional structures obtained from the Protein Data Bank (PDB), excluding theoretical models. While the protein databases contain protein sequences, the structural database contains coordinate files (PDB files) of biological molecules with solved (known) structures. Click on the arrow next to the search box at the top of the web page and view the list of databases for selection. The literature database is accessed through PubMed, which encompasses the National Library of Medicine's journals database, MEDLINE, as well as providing some additional online services. MEDLINE is a collection of medical and life science journal citations that includes articles dating back to the mid-1960's. Entrez allows access to information such as nucleotide and protein sequences organized by species in the NCBI taxonomy database, also found on the selection list. The connectivity of the databases available on the selection list are indicated by the diagram on the Entrez Database web page. Click on the diagram to access a Flash model of Entrez database connectivity. As long as the browser has a Flash plug-in, placing the mouse over one of the nodes representing a database will highlight its connectivity. Try this on the node labeled "Protein". Actually clicking on the node will forward the user to the database home page.

Use the back arrow of the browser to return to the Entrez Database web page. There is a menu bar at the top of many NCBI web pages that contains links to the most commonly used tools and databases, such as PubMed, Entrez, and BLAST. Click on the "Entrez"link at the top of the page. The Entrez cross-database search page should be visible in your browser, now. Here, one can enter a query and click "GO" to search against all databases, or click on a database link for the search page that is specific to that database. Perform a search using the query string Mycobacterium tuberculosis, and click "GO".

Exercise 2

How many PubMed literature citations and abstracts contain the character string Mycobacterium tuberculosis?

Exercise 3

How many nucleotide sequences are returned?

Exercise 4

How many protein sequences are returned?

Exercise 5

How many 3-D macromolecular structure entries are returned?

Click on one or two of the databases that returned items in response to this query. Take a quick look at the information returned as a match. This is an overwhelming amount of information that has been returned in response to this query. It is difficult to do anything with this much information. For this reason, a good search strategy is required to limit the search as cleverly as possible in an attempt to obtain mostly records of interest, with very little excess information, without restricting the search so much that it is likely to miss important records.

There are many different ways to limit a search query. To illustrate one approach available in Entrez, from the cross-database search page, click on the Nucleotide Database link. Notice the menu just under the query box, and click on the link entitled "Limits". Under "Limited to:", select "organism". On the pull-down menus, change the limits from "molecule" to "Genomic DNA/RNA", change "segmented sequences" to "show only master of set", and change "only from" to "GenBank". This limits the search from returning records from any type of molecule, including protein, ESTs, etc., to only records of submitted Genomic DNA or RNA sequences. It furthermore limits the sequences returned to only master sequences of any sets, and it only searches the GenBank database for records. Using Mycobacterium tuberculosis as the query string again, perform the search with these limits.

Exercise 6

Now, how many nucleotide sequences are returned?

Exercise 7

How does this compare to the number of nucleotide sequences returned in the cross-database search?

Hopefully, this has illustrated that a general cross-database search is best used when there is very little information available related to the query, and so it is desirable to find all pieces of related data. However, when lots of data is available related to the query, it it desirable to limit your items returned. Using the "Limits" function in Entrez is not always the best way to limit a query, though. Perhaps the area of interest happens to be genes that help confer drug resistance to Mycobacterium tuberculosis. Deselect the previously set limits by clicking on the check mark to the left so that it disappears. Now, search "nucleotide" using the query string "Mycobacterium tuberculosis drug resistance".

Exercise 8

How many items (sequence records) are returned?

Look at the list of results. The numbers at the head of each result are called access codes. Click on the access code of one of these records. The left column of the record contains terms that are referred to as "identifiers". The identifiers in any database are defined terms that indicate the record section and the type of data included in that section. Scroll down to the section entitled "Features". Two common identifiers found in this section are "gene" and "CDS" listings. The CDS tag identifies "coding DNA sequences", meaning these sequences have been determined (most often by bioinformatics and not experimental methods) to encode proteins, and are thus distinguished from the noncoding regions that make up a substantial amount of the DNA in the human genome. A good primer on the basic characteristics of DNA, including the differences between coding versus noncoding sequences, can be found on the Dolan DNA Learning Center web page (2). Scroll through the results, and notice that there are links embedded in this record. These links connect this record to other databases, as illustrated in the connectivity diagram discussed earlier in this module. So, even though this search was performed over the nucleotide database, the result may contain a link that takes us to a record in the protein database. Find a record that contains a "gene" link in the Features section of the record, and click on this link. In the new record, there should be a sequence of capital letters at the bottom of the CDS section.

Exercise 9

What does this sequence represent?

There is an additional sequence in lower case letters at the bottom of this record.

Exercise 10

What type of sequence is represented by the lower case letters?

If these questions regarding sequences have been difficult to answer, please review the genetic code, as this is prerequisite information for this course.

Try your own search. Scroll back to the top of the web page and this time next to the Search command, choose PubMed from the menu. Pick any life sciences topic that interests you for your query. Attempt a first query with a general topic, such as protein kinase or diabetes.

Exercise 11

What type of results does PubMed return from a query?

Note how many items in total (not just on the first page) were returned. Make your query topic related to your original choice, but more specific. For example, change 'protein kinase' to 'protein kinase C'.

Exercise 12

How much did this reduce the number of items returned?

This module is intended as an introduction to performing searches of the NCBI databases using Entrez. If you are unfamiliar with Entrez, please feel free to return to this module as a resource for getting started on NCBI searches.

References

  1. Benson D.A., Boguski M.S., Lipman D.J., Ostell J. (1994). GenBank. Nucleic Acids Res., 22:3441-3444.
  2. Dolan DNA Learning Center. [http://www.bioservers.org/].

Collection Navigation

Content actions

Download:

Collection as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Module as:

PDF | More downloads ...

Add:

Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks