The National Center for Biotechnology Information (NCBI)
provides a comprehensive website for biologists that includes
biology-related databases, and tools for viewing and analyzing
the data inherent in the databases. A division of the National
Library of Medicine at the National Institutes of Health, NCBI
is the agency responsible for creating automated systems for
storing and analyzing the rapidly growing profusion of genetic
and molecular data. One of the most difficult challenges faced
in the field of bioinformatics is how to store, in an easily
accessible manner, the overwhelming abundance of new
information, including the sequences of entire genomes, the
ongoing discoveries of new genes and gene products, and the
determinations of their functions and structures. NCBI was
established as the government's response to the need for more
and better information processing methods to deal with this
challenge.
View the
NCBI home
page. A relatively good overview of the tools and
databases that can be accessed through NCBI is provided in the
list along the left border of the home page. Clicking on the
link entitled "About NCBI" produces a second menu containing
the topics "A Science Primer", and "Databases and Tools",
among others. Selecting "A Science Primer" yields access to
general definitions and introductory information regarding the
branches of science included in bioinformatics. Many
bioinformatics terms are defined in this section in a
clear-cut and basic manner, making this Primer an excellent
first resource. Selecting "Databases and Tools" from the
"About NCBI" webpage menu yields a complete and well-ordered
listing of accessible information. This web page containing
the databases and tools menu is a good choice for those who
are inclined toward bookmarking.
The first item under the "Databases and Tools" menu is
"Literature Databases". PubMed is the most heavily used of the
literature databases and can be used to access MEDLINE
biological and medical scientific journal citations dating back
to articles written in the mid-1960's. The second item under
the "Databases and Tools"menu is "Entrez Databases".
Entrez is a search and retrieval system
developed by NCBI that is capable of accessing integrated
information by searching many of the NCBI databases with just
one query (instead of searching only one database per query,
then having to repeat the query to find information on the same
topic from another NCBI database). The NCBI databases that are
included in the search when you launch an Entrez query are shown
when you click on this link. The "Nucleotide Databases" link
under the "Databases and Tools" menu lists all the sequence
databases available through NCBI. These sequence databases
contain annotated collections of publicly available DNA, RNA and
protein sequences. The evolution of bioinformatics data mining
methods has been largely driven by the prodigious amount of
sequence information collected by scientists in recent years.
New sequences of unknown function can be compared with sequences
of well-characterized genes and proteins. Similarities can be
identified between the new, unknown sequences and the
well-characterized sequences, and used to postulate theories
regarding function or structure.
Among the tools listed under the NCBI "Databases and Tools"
menu, are "Tools for Data Mining". Selecting the "Tools for
Data Mining" topic will show a list of data retrieval tools,
including Entrez, mentioned above, and BLAST, the
Basic Local Alignment Search Tool. Blast
is the predominant sequence alignment tool for performing rapid
searches of nucleotide and protein sequence databases and
detecting local, as well as global, sequence alignments between
the query sequence and the database sequences.
This is a brief glimpse at some of the more widely used tools
and databases presented by NCBI, presented with the intention of
helping the novice get some feel for the number and types of
bioinformatics tools that are available on the internet today.
Several of these tools are covered in more detail in subsequent
modules included in this bioinformatics course. Before
proceeding to the next module, take a moment to return to the
"About NCBI" webpage menu and glance through some of the
interesting webpa ges linked under the topics "A Science
Primer", "Outreach and Education", and "News".
References-
Benson D.A., Boguski M.S., Lipman D.J., Ostell J. (1994). GenBank. Nucleic Acids Res., 22, 3441-3444.
-
Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. (1990). Basic local alignment search tool. J. Mol. Biol., 215, 403-410.