Director
Publications
Molecular Biology Lab
Proteomics
Infractructure
Phd
funding
Collaborators
visitors gallery
Trainees
 
1. Proteomic Databases developed at IOB:
Defining the 'Proteome'
The term "Proteome" defines the complete protein complement of a genome, has become the key research frontier in the post-genomic era. Recent advances in proteomic technologies, apart from the basic applications of gel-electrophoresis and mass spectrometric procedures, have helped in deciphering the primary protein sequence/structure as well as in knowing certain post-translational modifications or subtle protein-protein interactions. Even a decade after the human genome has been decoded; we are no where close to answering many of the biological questions. As genomic sequence information becomes available in unprecedented amounts, the lack of corresponding functional correlation with their gene transcripts and proteins, however, represent a significant roadblock. To bridge this gap and improve the efficiency of biological discoveries, Institute of Bioinformatics has developed various databases to identify and analyze protein products in a cell or tissue. These resources are freely available over the World Wide Web and can be easily accessed by the scientific community.
 
1.1 NetPath (http://www.netpath.org):
NetPath into the largest open-source repository of human signaling pathways that is all set to become a community standard to meet the challenges in functional genomics and systems biology. Signaling networks are the key to deciphering many of the complex networks that govern the machinery inside the cell. Several signaling molecules play an important role in disease processes that are a direct result of their altered functioning and are now recognized as potential therapeutic targets. Understanding how to restore the proper functioning of these pathways that have become deregulated in disease, is needed for accelerating biomedical research. This resource is aimed at demystifying the biological pathways and highlights the key relationships and connections between them. Apart from this, pathways provide a way of reducing the dimensionality of high throughput data, by grouping thousands of genes, proteins and metabolites at functional level into just several hundreds of pathways for an experiment. Identifying the active pathways that differ between two conditions can have more explanatory power than just a simple list of differentially expressed genes and proteins.
A thorough data-mining of scientific literature was carried out to catalog all the significant molecular interactions in single ligand-stimulated, receptor-mediated signaling pathways. Apart from protein-protein interactions, enzyme-substrate reactions, protein translocation events and gene regulation were also cataloged. Each of the pathway reactions are linked to their experimental evidence in the form of PubMed IDs of the respective articles from which they were mined. However, taking into account the heterogeneity inherent in experimental validation of different pathway reactions and the diversity of publicly available data, a set of very stringent criteria were applied for curation and for the generation of the pathway maps.
These pathways are freely available for download in various formats such as, BioPAX, PSI-MI and SBML. The availability of data in different formats allows interoperability between various pathway analysis software tools such as Cytoscape and VISIBIOweb. In order to provide a better visual interface of the molecular reactions in NetPath, pathway maps were generated using PathVisio, which is an improved visualization tool incorporating features of GenMAPP. These pathway maps are available through another resource called NetSlim (http:www.netpath.org/netslim) that was also developed at the Institute. The NetSlim versions of various pathways can be downloaded in .gpml, .GenMAPP, .png and pdf formats.
 
1.2 Human Protein Reference Database (http://www.hprd.org/):
The Human Protein Reference Database (HPRD) represents a centralized platform to visually depict and integrate information pertaining to each protein in the human proteome. It contains manually curated scientific information pertaining to the biology of most human proteins. The HPRD is a result of an international collaborative effort between the Institute of Bioinformatics and the Pandey lab at Johns Hopkins University in Baltimore, USA. The National Center for Biotechnology Information provides link to HPRD through its human protein databases (e.g. Entrez Gene, RefSeq protein) pertaining to genes and proteins.

All the information in HPRD has been manually curated by critical reading from published literature by expert biologists who read, interpret and analyze the published data. This resource depicts information on human protein functions including protein–protein interactions, post-translational modifications, enzyme-substrate relationships and disease associations. The protein–protein interaction and subcellular localization data from HPRD have been used to develop a human protein interaction network. Information regarding proteins involved in human diseases is also annotated and linked to Online Mendelian Inheritance in Man (OMIM) database.

HPRD was created using an object oriented database in Zope, an open source web application server that provides versatility in query functions and allows data to be displayed dynamically. As HPRD continues to evolve with newer entries, the number of unannotated genes and proteins is rapidly reducing consequently allowing us to expand the scope of our curation data. The data from HPRD can be freely accessed and used by academic users while commercial entities are required to obtain a license for use.

 
Goals:
1. The main goal in creating HPRD was to curate the world's literature on known and well characterized proteins which will inturn create a centralized knowledgebase of protein data.
2. Create a more robust curation system. Curation systems need be continually updated to include current research being done.
3. Enable future discoveries and empower scientists in their work. As we move into Next Generation Sequencing technologies, world is less focused on individual genes and instead the focus is more on high throughput studies, involving thousands of genes at a time.
4. Study systems biology approaches and aid in biomarker discovery.
5. Perform complex queries involving multiple features of proteins.
   
Highlights of HPRD are as follows:
From 10,000 protein–protein interactions (PPIs) annotated for 3,000 proteins in 2003, HPRD has grown to over 39,194 unique PPIs annotated for 30,047 proteins including more than 6,360 isoforms by the end of 2012.
More than 50% of molecules annotated in HPRD have at least one PPI and 10% have more than 10 PPIs.
Experiments for PPIs are broadly grouped into three categories namely in vitro, in vivo and yeast two hybrid (Y2H). Sixty percent of PPIs annotated in HPRD are supported by a single experiment whereas 26% of them are found to have two of the three experimental methods annotated.
HPRD contains 18,000 manually curated Post-Translational Modifications (PTM) data belonging to 26 different types of modifications.
All the phosphorylation based motifs for any protein of interest can be analyzed using PhosphoMotifFinder in HPRD. This tool connects the proteomic data in HPRD to over 320 experimentally proven phosphorylation based motifs curated from literature. Phosphorylation is the leading type of modification of protein contributing to 63% of PTM data annotated in HPRD.
HPRD data is available for download in tab delimited and XML file formats.
HPRD also integrates data from Human Proteinpedia, a community portal for integrating human protein data.
 
Milestones achieved and comparison with other publicly available databases:
1 HPRD is currently one of the richest sources of various aspects of PPI data as compared to other publicly available databases as shown in a comparative study.
2 This is the only completely manually curated database that assimilates PPIs, PTMs, subcellular localization, tissue expression, biological motifs and domains derived from variety of experimental platforms.
3 HPRD database gets nearly 1,48,000 hits in a year and about 400 visitors per day.
4 To date, it has been cited nearly 1,827 times by the scientific community in literature.
5 To the best of our knowledge, data from HPRD, Human Proteinpedia and RAPID databases are the only datasets from India that have been incorporated into NCBI databases such as Entrez Gene and RefSeq.
   
1.3 Human Proteinpedia (http://www.humanproteinpedia.org/):
Human Proteinpedia was developed as a community portal for sharing and integrating human proteomic data over the world wide web. Through this portal, research labs all over the world can contribute and upload their experimental data. This initiative is an effort to bring together the entire biomedical community and will enable dissemination of valuable proteomic data. This will empower scientists to take advantage of information that is at presently confined to particular research labs. Such a concerted effort will help enrich this database and minimize redundancy inherent in most other publicly available databases.

Data pertaining to post-translational modifications, protein-protein interactions, tissue expression, expression in cell lines, subcellular localization and enzyme substrate relationships can be submitted to Human Proteinpedia. It even allows proteomic investigators to share unpublished data and provides an effective means of sharing such data.

Human Proteinpedia currently contains over 4.8 million MS/MS spectra and ~2 million peptides and is an important resource for cataloging proteotypic peptides (which serve as a unique identifier of a given protein or isoform in tandem MS experiments) that can be used for biomarker analysis using MRM (Multiple Reaction Monitoring).

Human proteinpedia also provides a list of phophopeptides identified in Mass-Spectrometry based phosphoproteomic studies and the phosphorylation or dephosphorylation data curated from literature has been mapped to corresponding site and residue of sequences in HPRD. This is useful to investigators in the development of phospho-specific antibodies and peptide arrays.

Protein annotations present in Human Proteinpedia are derived from a number technology platforms such as co-immunoprecipitation, fluorescence based or western blotting or mass spectrometry based experiments, immunohistochemical analysis, yeast two-hybrid or protein and peptide microarrays.

 
Statistics to date:
Annotations HPRD Human Proteinpedia
iiii
iiii
iiii
iiii
iiii
iiii