![]() |
![]() |
Human X chromosome analysis: The human X chromosome has been widely studied for its connection to diseases such as mental retardation, hemophilia and nervous disorders. Considering its importance, a detailed annotation of this chromosome was essential. The human X chromosome is 155 million base pairs and is riddled with repeat-rich regions making it a difficult chromosome to study. The team at IOB carried out a comprehensive analysis and annotation of the entire X focusing not only on the genomic sequence, but also on the transcriptome and proteome. |
IOB’s analysis shows that up to 52% of the human X is covered with repetitive sequences, it also shows a gene density that is lower than the other chromosomes – 5 genes/Mb. Using comparative genomics, we found 43 novel protein coding regions on the X, in addition to the 696 known genes. One of the main features of the project involved a detailed analysis of pseudogenes. We were able to identify and document pseudogenes both on the X chromosome and those that were derived from genes present on the X. Surprisingly, we found that almost 5% of pseudogenes are being actively transcribed, the first study of its kind to span an entire chromosome. We found a total of 652 pseudogenes on the X chromosome of which 26 had evidence of transcription. Transcribed pseudogenes have the potential to interfere with many experimental techniques such as microarrays, as they have a high similarity to their protein coding parent gene. Our analysis of the transcriptome included a study of alternate splicing of genes on the X. This is the process by which one gene codes for more than one type of protein product. This has been widely reported and our analysis also proved that close to 45% of the genes on the X show evidence of alternate splicing. |
There are numerous RNA transcripts which do not code for any protein at all, these are classified as non coding or ncRNA. Truncated RNA molecules of protein coding genes are often wrongly annotated as ncRNAs. We showed that out of the 142 ncRNAs that were predicted for the X chromosome, 64 actually belonged to protein coding genes indicating the need for careful manual curation of genomic data. Our proteomic analysis of the X included a comprehensive analysis of protein sequences, post-translational modifications and their interacting proteins. The protein annotations have been incorporated into the Human Protein Reference Database, a database resource that we have developed previously. This annotation of the human X has been published in the April 2005 issue of the scientific journal 'Nature Genetics,' one of the most highly cited publications. This study was led by Akhilesh Pandey, M.D., Ph.D., an Assistant Professor at the Johns Hopkins University, Baltimore, USA who established the Institute of Bioinformatics in May 2002. |