Friday, March 13, 2015

Lack of Isolation By Distance pattern in Tiger mitochondrial data

A recent paper, that used ancient DNA sequences from museum samples concluded that Bali, Javan and Sumatran tigers had a closer genetic relationship by comparing it with diagnostic mtDNA sequences from various other tiger subspecies.

Due to lack of precise geographic information about the sampling locations, despite having a matrix of genetic distances they could not look at isolation by distance patterns. The historical geographic ranges of these subspecies can be found on wikipedia. One could, just assume sampling locations within the geographic range and check if a pattern of IBD can be seen. While performing such assumptions in a paper might not be possible, a blog provides you some freedom to do so. However, somebody familiar with Tigers might be able to provide better assumptions. 

ALT or Siberian Tiger is assumed to have been sampled from Sikhote-Alin    45°N 136°E. VIR or Caspian Tiger from Gansu 38°N 102°E. AMO or South China Tiger from Qin mountains 33°N 107°E. COR or Indo-chinese Tiger from Yunnan 25°N 101°E. JAX or Malayan Tiger from Kelantan 5°N 102°E. SUM or Sumatran Tiger from Bukit Barisan Selatan National Park 5°S 104°E. SON or Javan Tiger from Java 7°S 110°E. BAL or Bali Tiger from Bali 8°S 115°E. TIG or Bengal Tiger from Hazaribagh National Park 24°N 85°E.

The R package sp provides many useful functions to deal with spatial data. We use the spDistsN1 function to get the using Euclidean or Great Circle distance between two co-ordinates.

 library(sp)  
 library(reshape)  
 library("ecodist")  
 someCoords <- data.frame(long=c(136,102,107,101,102,104,110,115,85), lat=c(45,38,33,25,5,5,7,8,24))  
 apply(someCoords, 1, function(eachPoint) spDistsN1(as.matrix(someCoords), eachPoint, longlat=TRUE))->X  
 X[upper.tri(X)]->G  
 pop1<-c("ALT","ALT","ALT","ALT","ALT","ALT","ALT","ALT","VIR","VIR","VIR","VIR","VIR","VIR","VIR","AMO","AMO","AMO","AMO","AMO","AMO","COR","COR","COR","COR","COR","JAX","JAX","JAX","JAX","SUM","SUM","SUM","SON","SON","BAL")  
 pop2<-c("VIR","AMO","COR","JAX","SUM","SON","BAL","TIG","AMO","COR","JAX","SUM","SON","BAL","TIG","COR","JAX","SUM","SON","BAL","TIG","JAX","SUM","SON","BAL","TIG","SUM","SON","BAL","TIG","SON","BAL","TIG","BAL","TIG","TIG")  
 data.frame(pop1,pop2,G)->Geodist  
 Geodist2 <- with(Geodist, G)  
 nams <- with(Geodist, unique(c(as.character(pop1), as.character(pop2))))  
 attributes(Geodist2) <- with(Geodist, list(Size = length(nams),Labels = nams,Diag = FALSE,Upper = FALSE,method = "user"))  
 class(Geodist2) <- "dist"  
 read.table(file="Tiger_Fst.txt",header=TRUE,sep="\t",row.names = 1)->M  
 na.omit(melt(M))->N  
 data.frame(pop1,pop2,N$value)->Gendist  
 Gendist2 <- with(Gendist, N$value)  
 nams <- with(Gendist, unique(c(as.character(pop1), as.character(pop2))))  
 attributes(Gendist2) <- with(Gendist, list(Size = length(nams),Labels = nams,Diag = FALSE,Upper = FALSE,method = "user"))  
 class(Gendist2) <- "dist"  
 mantel(Geodist2 ~ Gendist2, nperm=10000)  
   mantelr    pval1    pval2    pval3  llim.2.5% ulim.97.5%   
 -0.06313036 0.67800000 0.32230000 0.65900000 -0.25930292 0.12999925   
 jpeg("Tiger_isolation_by_distance.jpeg")  
 plot(G,N$value/(1-N$value),xlab="Distance in Km",ylab="Fst/(1-Fst) from mtDNA sequences",main="Isolation by Distance pattern",pch=16,col="blue",xlim=c(0,6000))  
 text(G,N$value/(1-N$value),labels=paste(pop1,"  ",pop2))  
 dev.off()  
 jpeg("Tiger_Fst_cladogram.jpeg")  
 hc = hclust(dist(Gendist2))  
 op = par(bg = "#DDE3CA")  
 plot(hc, col = "#487AA1", col.main = "#45ADA8", col.lab = "#7C8071", col.axis = "#F38630", lwd = 3, lty = 3, sub = "", hang = -1, axes = FALSE,xlab="",main="Tiger Fst cladogram")  
 axis(side = 2, at = seq(0, 400, 100), col = "#F38630", labels = FALSE,   lwd = 2)  
 mtext(seq(0, 400, 100), side = 2, at = seq(0, 400, 100), line = 1, col = "#A38630", las = 2)  
 dev.off()  

The mantel's test shows that the IBD pattern has a very weak negative correlation. This is not entirely unexpected, given the high values of Fst that are almost saturated. Would using more markers with greater resolution capture a IBD pattern? How strong are the bottlenecks?

 
They have already used the data to build a phylogenetic tree and a network. As i need to do something different, i build a cladogram using the Fst matrix. While this cladogram shows that the SUM and BAL are close to each other, it fails to group SON with them. Various other differences from the phylogenetic tree can be seen. 




No comments: