Wednesday, December 11, 2019

Correcting the nucleotide sequence of the tiger genome at the base-pair level

Tiger is the national animal of not just India but also South Korea, Malaysia and Bangladesh. Such importance accorded to this animal is a reflection of its true grandeur. Unfortunately, the historic range of tigers has diminished drastically in this century leading to tigers being classified as an endangered species. Being an endangered large cat, considerable efforts have been directed at conservation of the tiger. Recent conservation efforts have turned to using genomic tools to answer new questions (for example, see: "Conservation priorities for endangered Indian tigers through a genomic lens").

Most studies focusing on conservation using genetic tools have been dealing with magnitude of the diversity, demographic history and its interaction with geographic extent. These approaches have helped develop strategies to control illegal trade and associated poaching. However, the use of expensive genomic tools to aid conservation efforts is still not a mainstream topic. Despite discussions regarding conservation genomics and its utility, concrete examples of genomics making a difference on the ground are still rare and far between.

When the genomes of primates such as human and chimp were sequenced almost two decades ago, the promise of comparative genomics in identifying human specific traits was of great interest. A very compelling example of differences between human and chimp within the exonic region is that of exon2 in PRM1 gene. The alignment of human and chimp genomes for this region is given below:

Human      GGTGCTGCCGCCCCAGGTACAGACCGCGATGTAGAAGACACTAATTGCACAAAATAGCACATC
Chimpanzee GGTGCTGCCGCCGCAGGTCCAGAATGAGACGTAGAAGACACTAATTGCACAGAATAGCACATC 
Originally, this pattern of four amino-acid encoding differences within a single exon was reported by Sabeti et al 2006 (Positive Natural Selection in the Human Lineage).


Human                    CGCCCCAGGTACAGACCGCGATGTAGAAGACACTAATTGC
Bonobo                   CGCCGCAGGTCCAGACTGAGACGTAGAAGACACTAATTGC
Chimpanzee               CGCCGCAGGTCCAGAATGAGACGTAGAAGACACTAATTGC
Gorilla                  CGCCGCAGGAACAGACTGAGACGTAGAAAACACTAATTGC
Orangutan                CGCCGCAGGTACAGACTGAGATGTAGAAGACACTAATTGC
Gibbon                   CGCCCCAGGTACAGGCTGAGACGTAGAAGACACTAATTGC
Sooty mangabey           CGCCGCAGGTACAGGCTGAGGTGTAGAAGATACTAATTGC
Drill                    CGCCGCAGGTACAGGCTGAGGTGTAGAAGATACTAATTGC
Olive baboon             CGCCGCAGGTACAGGCTGAGGTGTAGAAGATACTAATTGC
Gelada                   CGCCGCAGGTACAGGCTGAGGTGTAGAAGATACTAATTGC
Crab-eating macaque      CGCCGCAGGTACAGGCTGAGGTGTAGAAGATACTAATTGC
Macaque                  CGCCGCAGGTACAGGCTGAGGTGTAGAAGATACTAATTGC
Pig-tailed macaque       CGCCGCAGGTACAGGCTGAGGTGTAGAAGATACTAATTGC
Vervet-AGM               CGCCGCAGGTACAGGCTGAGGTGTAGAAGATACTAATTGC
Angola colobus           CGCCGCAGGTACAGGCTGAGGTGTAGAAGATACTAATTGC
Ugandan red Colobus      CGCCGCAGGTACAGGCGGAGGTGTAGAAGATACTAATTGC
Black snub-nosed monkey  CGCCGCAGGTACAGGCTGAGGTGTAGAAGATACTAATTGC
Golden snub-nosed monkey CGCCGCAGGTACAGGCTGAGGTGTAGAAGATACTAATTGC
Ma's night monkey        CGCCGCAGGTATAAGCCGCGGTGTAGAAGACACTAATTGC
Marmoset                 CGCCGCAGGTACAAGCTGCCATGTAGAAGATACTAATTGC
Capuchin                 CGCCGCAGGTACAGACTGAGGTGTAGAAGATACTAATTGC
Bolivian squirrel monkey CGCCGCAGGTACAAGCTGAGGTGTAGAAGATACTAATTGC
Tarsier                  CGCCGCTCCTTCCGGCTGAGGTGTAGAAGATACTGA-CGC
Mouse Lemur              CGCCGCAGGTACAGGTGTAGAAGAAGAAGATACTAAATGC
Greater bamboo lemur     CGCCGCAGGTACAGG------TGTAGAAGATACTAAATGC
Coquerel's sifaka        CGCCGCAGGTACAG---GTGTAGAAGAAGATACTAAATGC
Bushbaby                 CGCCGCAGGTACAGGCTGAGGTGTAGAAGATACTAAACGC

Using a multiple sequence alignment that spans 27 primate species we are able to further delineate the changes that have occurred in the human lineage vs those that have happened in the chimp lineage. The PRM1 gene codes for a protamine protein that acts as a substitute for histones in the chromatin of sperm during the haploid phase of spermatogenesis. Striking patterns of positive selection and associated changes in the sperm morphology have been documented in various species. Identification of such amino-acid altering substitutions between species would contribute to a better understanding of the species and help define the entity that is the focus of conservation.

Given such interesting insights at the molecular level from genome sequencing, genome sequencing of any species has the potential to reveal interesting new information about a species. The genome of the tiger was first reported by Cho et al 2013 (The tiger genome and comparative analysis with lion and snow leopard genomes). Subsequent studies have used the tiger genome for comparative analysis in many high profile papers to identify patterns of protein evolution. 

Mittal et al 2019 (Comparative analysis of corrected tiger genome provides clues to its neuronal evolution) report corrections in the genome assembly sequence of the first tiger genome published by Cho et al 2013 and currently available as PanTig1.0 on ensemble as part of the release 98 (September 2019). In addition to the support from raw read data, the authors rely upon multiple sequence alignment based ancestral states and re-sequencing data from other individuals to ensure that the corrections that they are performing are correct. Having been on biorxiv for almost a year, this corrected tiger genome will hopefully motivate a speedy update of the tiger genome assembly on ensemble. The underlying program used for genome correction is called SeqBug. It is freely available for download on its own github page and is a better version of BCD.