| 1. Proteomic Databases developed at IOB: |
|
Defining the 'Proteome' |
The term "Proteome" defines
the complete protein complement of a genome, has become the key research
frontier in the post-genomic era. Recent advances in proteomic technologies,
apart from the basic applications of gel-electrophoresis and mass spectrometric
procedures, have helped in deciphering the primary protein sequence/structure
as well as in knowing certain post-translational modifications or subtle
protein-protein interactions. Even a decade after the human genome has
been decoded; we are no where close to answering many of the biological
questions. As genomic sequence information becomes available in unprecedented
amounts, the lack of corresponding functional correlation with their
gene transcripts and proteins, however, represent a significant roadblock.
To
bridge this gap and improve the efficiency of biological discoveries,
Institute of Bioinformatics has developed various databases to identify
and analyze
protein products in a cell or tissue. These resources are freely available
over the World Wide Web and can be easily accessed by the scientific
community. |
| |
| 1.1 NetPath (http://www.netpath.org): |
NetPath into the largest open-source repository of human signaling pathways
that is all set to become a community standard to meet the challenges in
functional genomics and systems biology. Signaling networks are the key
to deciphering many of the complex networks that govern the machinery inside
the cell. Several signaling molecules play an important role in disease
processes that are a direct result of their altered functioning and are
now recognized as potential therapeutic targets. Understanding how to restore
the proper functioning of these pathways that have become deregulated in
disease, is needed for accelerating biomedical research. This resource
is aimed at demystifying the biological pathways and highlights the key
relationships and connections between them. Apart from this, pathways provide
a way of reducing the dimensionality of high throughput data, by grouping
thousands of genes, proteins and metabolites at functional level into just
several hundreds of pathways for an experiment. Identifying the active
pathways that differ between two conditions can have more explanatory power
than just a simple list of differentially expressed genes and proteins.
A thorough data-mining of scientific literature was carried out to catalog
all the significant molecular interactions in single ligand-stimulated,
receptor-mediated signaling pathways. Apart from protein-protein interactions,
enzyme-substrate reactions, protein translocation events and gene regulation
were also cataloged. Each of the pathway reactions are linked to their
experimental evidence in the form of PubMed IDs of the respective articles
from which they were mined. However, taking into account the heterogeneity
inherent in experimental validation of different pathway reactions and
the diversity of publicly available data, a set of very stringent criteria
were applied for curation and for the generation of the pathway maps.
These pathways are freely available for download in various formats such
as, BioPAX, PSI-MI and SBML. The availability of data in different formats
allows interoperability between various pathway analysis software tools
such as Cytoscape and VISIBIOweb. In order to provide a better visual interface
of the molecular reactions in NetPath, pathway maps were generated using
PathVisio, which is an improved visualization tool incorporating features
of GenMAPP. These pathway maps are available through another resource called
NetSlim (http:www.netpath.org/netslim) that was also developed at the Institute.
The NetSlim versions of various pathways can be downloaded in .gpml, .GenMAPP,
.png and pdf formats.
|
| |
| 1.2 Human Protein Reference Database (http://www.hprd.org/): |
The Human Protein Reference Database (HPRD) represents a centralized
platform to visually depict and integrate information pertaining to each
protein in the human proteome. It contains manually curated scientific
information pertaining to the biology of most human proteins. The HPRD
is a result of an international collaborative effort between the Institute
of Bioinformatics and the Pandey lab at Johns Hopkins University in Baltimore,
USA. The National Center for Biotechnology Information provides link
to HPRD through its human protein databases (e.g. Entrez Gene, RefSeq
protein) pertaining to genes and proteins.
All the information in HPRD has been manually curated by critical
reading from published literature by expert biologists who read, interpret
and analyze the published data. This resource depicts information on
human protein functions including protein–protein interactions,
post-translational modifications, enzyme-substrate relationships and
disease associations. The protein–protein interaction and subcellular
localization data from HPRD have been used to develop a human protein
interaction network. Information regarding proteins involved in human
diseases is also annotated and linked to Online Mendelian Inheritance
in Man (OMIM) database.
HPRD was created using an object oriented database in Zope, an open
source web application server that provides versatility in query functions
and allows data to be displayed dynamically. As HPRD continues to evolve
with newer entries, the number of unannotated genes and proteins is
rapidly reducing consequently allowing us to expand the scope of our
curation data. The data from HPRD can be freely accessed and used by
academic users while commercial entities are required to obtain a license
for use.
|
| |
| Goals: |
| 1. |
The main goal in creating HPRD was to curate the world's literature
on known and well characterized proteins which will inturn create a centralized
knowledgebase of protein data. |
| 2. |
Create a more robust curation system. Curation systems need be continually
updated to include current research being done. |
| 3. |
Enable future discoveries and empower scientists in their work. As
we move into Next Generation Sequencing technologies, world is less focused
on individual genes and instead the focus is more on high throughput
studies, involving thousands of genes at a time. |
| 4. |
Study systems biology approaches and aid in biomarker discovery. |
| 5. |
Perform complex queries involving multiple features of proteins. |
| |
|
| Highlights of HPRD are as follows:
|
| • |
From 10,000 protein–protein interactions (PPIs) annotated for
3,000 proteins in 2003, HPRD has grown to over 39,194 unique PPIs annotated
for 30,047 proteins including more than 6,360 isoforms by the end of
2012. |
| • |
More than 50% of molecules annotated in HPRD have at least one PPI
and 10% have more than 10 PPIs. |
| • |
Experiments for PPIs are broadly grouped into three categories namely
in vitro, in vivo and yeast two hybrid (Y2H). Sixty percent of PPIs annotated
in HPRD are supported by a single experiment whereas 26% of them are
found to have two of the three experimental methods annotated. |
| • |
HPRD contains 18,000 manually curated Post-Translational Modifications
(PTM) data belonging to 26 different types of modifications. |
| • |
All the phosphorylation based motifs for any protein of interest can
be analyzed using PhosphoMotifFinder in HPRD. This tool connects the
proteomic data in HPRD to over 320 experimentally proven phosphorylation
based motifs curated from literature. Phosphorylation is the leading
type of modification of protein contributing to 63% of PTM data annotated
in HPRD. |
| • |
HPRD data is available for download in tab delimited and XML file formats. |
| • |
HPRD also integrates data from Human Proteinpedia, a community portal
for integrating human protein data. |
| |
| Milestones achieved and comparison with other publicly
available databases:
|
| 1 |
HPRD is currently one of the richest sources of various aspects of
PPI data as compared to other publicly available databases as shown in
a comparative study. |
| 2 |
This is the only completely manually curated database that assimilates
PPIs, PTMs, subcellular localization, tissue expression, biological motifs
and domains derived from variety of experimental platforms. |
| 3 |
HPRD database gets nearly 1,48,000 hits in a year and about 400 visitors
per day. |
| 4 |
To date, it has been cited nearly 1,827 times by the scientific community
in literature. |
| 5 |
To the best of our knowledge, data from HPRD, Human Proteinpedia and
RAPID databases are the only datasets from India that have been incorporated
into NCBI databases such as Entrez Gene and RefSeq. |
| |
|
| 1.3 Human Proteinpedia (http://www.humanproteinpedia.org/): |
|
Human Proteinpedia was developed as a community portal for sharing and
integrating human proteomic data over the world wide web. Through this
portal, research labs all over the world can contribute and upload
their experimental data. This initiative is an effort to bring together
the entire biomedical community and will enable dissemination of valuable
proteomic data. This will empower scientists to take advantage of information
that is at presently confined to particular research labs. Such a concerted
effort will help enrich this database and minimize redundancy inherent
in most other publicly available databases.
Data pertaining to post-translational modifications, protein-protein
interactions, tissue expression, expression in cell lines, subcellular
localization and enzyme substrate relationships can be submitted
to Human Proteinpedia. It even allows proteomic investigators to
share unpublished data and provides an effective means of sharing
such data.
Human Proteinpedia currently contains over 4.8 million MS/MS spectra
and ~2 million peptides and is an important resource for cataloging
proteotypic peptides (which serve as a unique identifier of a given
protein or isoform in tandem MS experiments) that can be used for
biomarker analysis using MRM (Multiple Reaction Monitoring).
Human proteinpedia also provides a list of phophopeptides identified
in Mass-Spectrometry based phosphoproteomic studies and the phosphorylation
or dephosphorylation data curated from literature has been mapped
to corresponding site and residue of sequences in HPRD. This is useful
to investigators in the development of phospho-specific antibodies
and peptide arrays.
Protein annotations present in Human Proteinpedia are derived from
a number technology platforms such as co-immunoprecipitation, fluorescence
based or western blotting or mass spectrometry based experiments,
immunohistochemical analysis, yeast two-hybrid or protein and peptide
microarrays.
|
| |
| Statistics to date: |
| Annotations |
HPRD |
Human Proteinpedia |
| iiii |
|
|
| iiii |
|
|
| iiii |
|
|
| iiii |
|
|
| iiii |
|
|
| iiii |
|
|
|
| |
| |
| |