Summary: This is an introduction to the bioinformatics website provided by the National Center for Biotechnology Information (NCBI). It includes an overview of the basic mission of NCBI and an introduction to the most commonly used biological databases available on the website and the tools for viewing and analyzing the data.
The National Center for Biotechnology Information (NCBI) provides a comprehensive website for biologists that includes biology-related databases, and tools for viewing and analyzing the data inherent in the databases. A division of the National Library of Medicine at the National Institutes of Health, NCBI is the agency responsible for creating automated systems for storing and analyzing the rapidly growing profusion of genetic and molecular data. One of the most difficult challenges faced in the field of bioinformatics is how to store, in an easily accessible manner, the overwhelming abundance of new information, including the sequences of entire genomes, the ongoing discoveries of new genes and gene products, and the determinations of their functions and structures. NCBI was established as the government's response to the need for more and better information processing methods to deal with this challenge.
View the NCBI home page. A relatively good overview of the tools and databases that can be accessed through NCBI is provided in the list along the left border of the home page. Clicking on the link entitled "About NCBI" produces a second menu containing the topics "A Science Primer", and "Databases and Tools", among others. Selecting "A Science Primer" yields access to general definitions and introductory information regarding the branches of science included in bioinformatics. Many bioinformatics terms are defined in this section in a clear-cut and basic manner, making this Primer an excellent first resource. Selecting "Databases and Tools" from the "About NCBI" webpage menu yields a complete and well-ordered listing of accessible information. This web page containing the databases and tools menu is a good choice for those who are inclined toward bookmarking.
The first item under the "Databases and Tools" menu is "Literature Databases". PubMed is the most heavily used of the literature databases and can be used to access MEDLINE biological and medical scientific journal citations dating back to articles written in the mid-1960's. The second item under the "Databases and Tools"menu is "Entrez Databases". Entrez is a search and retrieval system developed by NCBI that is capable of accessing integrated information by searching many of the NCBI databases with just one query (instead of searching only one database per query, then having to repeat the query to find information on the same topic from another NCBI database). The NCBI databases that are included in the search when you launch an Entrez query are shown when you click on this link. The "Nucleotide Databases" link under the "Databases and Tools" menu lists all the sequence databases available through NCBI. These sequence databases contain annotated collections of publicly available DNA, RNA and protein sequences. The evolution of bioinformatics data mining methods has been largely driven by the prodigious amount of sequence information collected by scientists in recent years. New sequences of unknown function can be compared with sequences of well-characterized genes and proteins. Similarities can be identified between the new, unknown sequences and the well-characterized sequences, and used to postulate theories regarding function or structure.
Among the tools listed under the NCBI "Databases and Tools" menu, are "Tools for Data Mining". Selecting the "Tools for Data Mining" topic will show a list of data retrieval tools, including Entrez, mentioned above, and BLAST, the Basic Local Alignment Search Tool. Blast is the predominant sequence alignment tool for performing rapid searches of nucleotide and protein sequence databases and detecting local, as well as global, sequence alignments between the query sequence and the database sequences.
This is a brief glimpse at some of the more widely used tools and databases presented by NCBI, presented with the intention of helping the novice get some feel for the number and types of bioinformatics tools that are available on the internet today. Several of these tools are covered in more detail in subsequent modules included in this bioinformatics course. Before proceeding to the next module, take a moment to return to the "About NCBI" webpage menu and glance through some of the interesting webpa ges linked under the topics "A Science Primer", "Outreach and Education", and "News".