Continuing with our previous exploits Analyzing in vitro toxicity and expression data, its time to combine SNP data tables using perl.
To do something bonus, lets calculate transitions and transversions to infer the Ti/Tv ratio.
Info: 1852527 Transistions
#!/usr/bin/perl
open CANFAM2, $ARGV[0] or die $!;
open CANFAM3, $ARGV[1] or die $!;
%can2=();
%can3=();
while($line = <CANFAM2>){
chomp $line;
@tabs=split(/\s+/,$line);$tabs[9]=~s/[\;\"]//g;$tabs[11]=~s/[\;\"]//g;
$key=$tabs[9];
$value=$tabs[0] . "_" . $tabs[3] . "_" . $tabs[11];
if(exists $can2{$key}){print "Error:Duplicate record for SNP_ID $tabs[9]. Skipping second occurence.\n";}
else{$can2{$key}=$value;}
}#end of file while loop
while($line = <CANFAM3>){
chomp $line;
@tabs=split(/\s+/,$line);$tabs[9]=~s/[\;\"]//g;$tabs[11]=~s/[\;\"]//g;
$key=$tabs[9];
$value=$tabs[0] . "_" . $tabs[3] . "_" . $tabs[11];
if(exists $can3{$key}){print "Error:Duplicate record for SNP_ID $tabs[9]. Skipping second occurence.\n";}
else{$can3{$key}=$value;}
}#end of file while loop
print "SNP_ID\tbases\tchr_canfam2\tpos_canfam2\tchr_canfam3\tpos_canfam3\tdistance_between_versions\n";
$transitions=0;
$transversions=0;
foreach $mykey(sort keys %can2){
@can2tabs=split(/\_/,$can2{$mykey});
if($can2tabs[2]=~m/(AG|GA|CT|TC)/){$transitions++;}
elsif($can2tabs[2]=~m/(AC|CA|GT|TG)/){$transversions++;}
if(exists $can3{$mykey}){
@can3tabs=split(/\_/,$can3{$mykey});
if($can2tabs[0]=~m/$can3tabs[0]/){$distance=abs($can2tabs[1]-$can3tabs[1]);}else{$distance="NA";}
print "$mykey\t$can2tabs[2]\t$can2tabs[0]\t$can2tabs[1]\t$can3tabs[0]\t$can3tabs[1]\t$distance\n";
}
else {
print "Error:SNP_ID $mykey missing in canfam3.\n";
print "$mykey\t$can2tabs[2]\t$can2tabs[0]\t$can2tabs[1]\tNA\tNA\tNA\n";
}
}
foreach $mykey(sort keys %can3){
if(!(exists $can2{$mykey})){
print "Error:SNP_ID $mykey missing in canfam2.\n";
@can3tabs=split(/\_/,$can3{$mykey});
print "$mykey\t$can3tabs[2]\tNA\tNA\t$can3tabs[0]\t$can3tabs[1]\tNA\n";
if($can3tabs[2]=~m/(AG|GA|CT|TC)/){$transitions++;}
elsif($can3tabs[2]=~m/(AC|CA|GT|TG)/){$transversions++;}
}
}
$titv=$transitions/$transversions;
print "Info: $transitions Transistions\n";
print "Info: $transversions Transversions\n";
print "Info: $titv Ti/Tv ratio\n";
close CANFAM2;
close CANFAM3;
To do something bonus, lets calculate transitions and transversions to infer the Ti/Tv ratio.
Info: 1852527 Transistions
Info: 504318 Transversions
Info: 3.67333111251234 Ti/Tv ratio
Apart from this, 39,352 SNP's were found in only one of the two versions of the assembly. These could be attributed to gaps in the assembly that were filled at a later stage by Illumina sequencing.
No comments:
Post a Comment