Sunday, December 22, 2013

Pseudogene distribution across the human genome

Pseudogene's are those genes which have lost their ability to code proteins or are not expressed due to other changes in the genome. The Ensemble genes 74 annotation of the human genome hg19 has 15,605 annotated pseudogenes. Based on extensive manual curation and automated predictions, the number of known pseudogenes has increased in the human genome over time. 

Pseudogene distribution in human
Distribution of pseudogenes across the human chromosomes (hg19)
Above figure (Ideograph generated by Idiographica ) shows the distribution of the pseudogenes annotated in the latest build of Ensemble. The complete lack of pseudogenes on the small arms of chromsome 13, 14, 15 and 22 is rather striking. A more comprehensive annotation dedicated to the identification and analysis of pseudo genes can be found at pseudogene.org. The latest build consists of 17172 records. The same pattern can be seen even in this more extensive annotation.


While the pattern is striking, it might be due to changes to the chromosome builds affecting the short arms of these chromosomes. However, the possibility of this being biological is of course very interesting. Could it correspond to chromatin type or some other genomic feature? Apparently it does correspond to the hetero-chromatic region of the genome that has not been sequenced.