This is just a simple set of commands to create a RAD-seq processing tool when the genome is available. Since, it accepts reads mapped to genome in a bam file as input, it is capable of handling RAD-seq, exome sequencing as well as low coverage re-sequencing data. Paired-end as well as single end reads are accepted.
module load samtools
module load BEDTools
#calculate per site coverage for each individual bam file
samtools depth *.sorted.bam > all.coverage
#sum the persite coverage across all individuals
awk '{for(i=3;i<=NF;i++) t+=$i; print t; t=0}'|sed 's/ /\t/g' all.coverage > persite.coverage
#make a histogram of persite coverage across all individuals
cut -f 3 persite.coverage |awk '{counts[$1] = counts[$1] + 1; } END { for (val in counts) print val, counts[val]; }'|sort -k1n,1 > persite.coverage.hist
#remove sites with more than 200 reads(as inferred from histogram from previous step) per site
awk '{for(i=3;i<=NF;i++) t+=$i; if(t<200)print $0; t=0}'|sed 's/ /\t/g' all.coverage > non.repeat.sites
#find sites that have atleast 5 reads in 10 individuals
awk '{for(i=3;i<=NF;i++) if($i > 5)t++; if(t>10)print $1,$2,$2,t; t=0}' non.repeat.sites|sed 's/ /\t/g' > wellcovered.sites
#collapse sites into loci
bedtools merge -i wellcovered.sites > wellcovered.sites.bed
#call SNP's for selected loci
samtools mpileup -l wellcovered.sites.bed -uf ref.fa *.sorted.bam | bcftools view -bvcg - > var.raw.bcf
bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf
No comments:
Post a Comment