Thursday, December 4, 2014

openSNP MDS plot from plink

After downloading all the genotype data from the openSNP website, the first thing to do is to see population structure ofcourse!

Using AIM's (Ancestry Informative Markers) is a rather quick method to determine ancestry using a minimal set of markers. Various groups have used slightly different methods to come up with such markers. Instead of looking at these markers, we look at Lactose tolerance phenotype and the population structure in the genotypes with this trait.

The MDS plot (generated using plink after converting using opengwas) has one weird outlier that does not cluster. It is curious, but i hope they make the data available in a easier to use/standardised format so that i can dig into this.

The number of males that decided to reveal their gender [290/1379(21.03%) ] is double that of females [149/1379(10.8%) ]. Data of birth was shared slightly more reluctantly with 403/1379 (29.22%) sharing their DOB compared to the 439/1379(31.83%) that shared their gender. Not surprisingly most people who shared their gender also shared their DOB (116/149 females and 253/290 males). See below barplot of the distribution of DOB's with mean age of ~40 (minimum is 19 and max is 114). 

