PDB - Protein Data Bank
Quick Look
Looking for a protein sequence? protein structure? Then the RCSB
Protein Data Bank is the website to go to! Almost all proteins
that have been sequenced are available at the PDB! As of 22 Feb 2000, 11753
structures have been deposited in the database.
It is the most reknowned protein sequence-structure database available
on the net. The Protein
Data Bank, located at The
Research Collaboratory for Structural
Bioinformatics (RCSB), formerly
located at Brookhaven National Laboratories, is "the single international
repository for the processing and distribution of 3-D macromolecular structure
data primarily determined experimentally by X-ray crystallography and NMR."
PDB-ID
Each structure file in the database is given a PDB-ID or a PDB Code. This
is a four-character alphanumeric code used for accessing the files. The
code is comprised of digit/s (0-9) and uppercase letters (A-Z). The assignment
of the PDB-ID's are, however, not in any particular order. The indexers
at the Data Bank devised mnemonics so that the files would be easy to remember.
Here are some samples:
1MNP - Manganese Peroxidase
2CYP - Cytochrome C Peroxidase
3INS - Insulin
Obviously, looking for a particular protein structure amidst the thousands
stored in the database simply by the PDB-ID code is tedious and difficult!
However, PDB does have a search engine and an organized record for each
file. The format of each structure file provides an easy way for searching.
How are the structure files encoded/stored?
A PDB Structure file is separated into different sections:
Title Section
- contains the folowing information:
-
HEADER - contains the idCode field which uniquely identifies the file from
the other PDB files. This part also gives the classification for the entry
and the date the file was deposited at the PDB.
-
TITLE - contains the title for the experiment/analysis represented in the
entry
-
CAVEAT - states any errors in the entry
-
COMPND - describes the macromolecular contents of an entry
-
SOURCE - the biological/chemical source (both common name and scientific
name) of the molecule
-
KEYWDS - contains a set of terms relevant to the entrythat provide means
of categorizing and generating index files
-
EXPDTA - shows experiment information such as the technique used (electron
diffraction, fiber diffraction, flourescence transfer, neutron diffraction,
NMR theoretical model, X-Ray diffraction)
-
AUTHOR - people responsible for the contents
-
REVDAT - revision data since the release of the entry
-
REMARK - any experimental details, annotations, comments, and information
not included in the previous parts
Primary Structure Section
- enumerates the primary sequence or the sequence of residues for each
chain of the protein. Also contains the non-standard residues like prosthetic
groups, inhibitors, solvents and ions. Additional information include the
name and formula of hetero groups in the macromolecule.
Secondary Structure Section
- contains the data on the helices, sheets, and turns found in protein.
Positions of turns, helices and sheets are provided. These are also named
and numbered.
Connectivity Annotation
Section - contains information on the existence and location of
disulfide bonds and other linkages
Miscellaneous Features
Section - describes features such as the active site
Crystallographic
and Coordinate Transformation Section - contains the geometry of
the crystallographic experiment and the coordinate system transformations
Coordinate Section
- gives the atomic coordinates
Connectivity Section
- gives information on chemical connectivity or how the atoms are connected
to each other. Information here includes hydrogen bonds, salt bridges,
and links.
Bookkeeping Section
- gives final information about the file itself
Viewing the structure file
Although organized, a PBD file, is difficult to understand due to the length
and the amount of data included in the file. Therefore it is not advisable
for one to go through the file contents using a simple text viewer. PDB
therefore offers a Structure Explorer built within the website itself that
displays the content in an easily comprehensible manner. Aside from this,
different kinds of software are available to interpret the PDB files. These
software include the 3D structure rendering programs such as RasMol,
Chime, and Cn3D. The PDB website
itself contains links to these various data interpreting softwares.
Website Features
The Structure Explorer built into the PDB website is useful to look at
for it provides lots of useful information:
-
View Structure - provides 2 ways of viewing the structure: 2D and
3D. The 2D viewer provides still images that are usually in cylinders or
ribbons display. The view structure section provides different 3D viewing
methods - VRML (usually the plug-in is built in the Browser), RasMol (an
external viewer that is available at http://www.umass.edu/microbio/rasmol/),
Chime (browser plug-in available at the MDLI
website - the makers of the plug-in). The molecular images displayed in this tutorial were made using Chime.
-
Summary Information -
displays the general information about the molecule (usually that contained
in the Title Section of the PDB file)
-
Download/Display File - the interface where the user can choose
to display the structure file itself on screen (text format or html format)
or to download the structure file (in 2 encoded formats, PDB or mmCIF -
zipped and unzipped files are available)
-
Structural Neighbors
- contains links to websites that provide information on similarities of
the molecule to other molecules. The links include CATH, SCOP, FSSP, VAST,
CE.
-
Geometry - shows the geometry information on the molecule in two
formats: table and graphical. Bond lengths, Bond angles, Fold deviation
score are some of the information that would be displayed
-
Other Sources - provides
useful links that contain more information about the molecule
-
Sequence Details - displays
the sequence details (residues, molecular weight, secondary structures,
etc)!