Tuesday, November 18, 2014

Optical map dot plot

Using Optical Maps to scaffold genomes of bacteria seems to be a fairly common practice. While it is yet to find widespread use in Eukaryotic species, it has found some success in species such as the cow, rice and parrot.

Here we use the assembled Optical map provided as an example with the SOMA package to demonstrate Optical map dot plot creation.

First, we perform an all vs all search between the optical map and in-silico digest of the genome using below script. The script has the parameter called search radius which decides how much flexibility is to be allowed while searching.
 #!/usr/bin/perl   
 use warnings;   
 use Math::Round qw(nearest);  
 use Set::IntSpan;  
 #  
 #perl Map_Optical.pl scaffold.silico soma_format_maps/1.map  
 ## Input parameters  
 %cutlens=();  
 my $roundoffto=10;  
 my $searchradius=200;  
 open CUTSITES, $ARGV[0] or die $!;  
 while($line = <CUTSITES>){  
 chomp $line;  
 my @header = split / /, $line;  
 my $scaffold=$header[0];  
 my $scaflen=$header[1];  
 my $totalcutsites= $header[2];  
 $line = <CUTSITES>;chomp $line;  
 my @cutsites = split / /, $line;  
 my $iso="";  
 my $cutcount=0;  
 my $previous=0;  
   foreach $iso (@cutsites){  
    my $cutlength=$iso-$previous;  
    if($previous>0){  
    $cutcount++;  
    $key=$scaffold . "#" . $cutcount;  
    $cutlens{$cutlength}{$key}=$previous . "_" . $iso;  
    }  
    $previous=$iso;  
    }  
          }#end of while loop  
 close CUTSITES;  
 #print "Done reading Silico-digest provided in file: $ARGV[0]\n";  
 my $mapsite=0;  
 my %mapmaps=();  
 open SOMAP, $ARGV[1] or die $!;   
 while($line = <SOMAP>){   
 chomp $line;  
 my @header2 = split / /, $line;  
 $mapsite++;  
 my $testvalue=$header2[0];  
 my $teststart=$testvalue-$searchradius;  
 my $testend=$testvalue+$searchradius;  
 #print "Searching for $testvalue lengths with a search radius of $searchradius.\n";  
 my $i="";  
 for($i=$teststart;$i<$testend;$i++){  
    foreach my $key (keys %{$cutlens{$i}}) {  
    my @keysp = split /\#/, $key;  
      my @beds= split /\_/, $cutlens{$i}{$key};  
 # scaffold     start     end     map_position     search_radius_hit     map_frag_len     cut_frag_len  
 #   print "$keysp[0]\t$beds[0]\t$mapsite\t$i\t$testvalue\t$keysp[1]\n";  
 #   print "$keysp[0]\t$beds[1]\t$mapsite\t$i\t$testvalue\t$keysp[1]\n";  
 #   print "$keysp[0]\tNA\tNA\n"  
      print "$keysp[0]\t$beds[0]\t$beds[1]\t$mapsite\t$i\t$testvalue\t$keysp[1]\n";  
    }  
 }  
 }#end of while loop  
 close SOMAP;  
After performing a search of the Optical map, the output bed file can be converted into a dotplot file using below script:
 #!/usr/bin/perl  
 use warnings;  
 ##  
 ##perl getfull.pl genome.fa.fai assembled.map.bed   
 ### Input parameters  
 #  
 my %cumlen=();  
 my $cumcount=0;  
 open CUTSITES, $ARGV[0] or die $!;  
 while($line = <CUTSITES>){  
 chomp $line;  
 my @header = split /\t/, $line;  
 $cumlen{$header[0]}=$cumcount;  
 $cumcount=$cumcount+$header[1];  
 }  
 close CUTSITES;  
 open MAPSITES, $ARGV[1] or die $!;  
 while($line = <MAPSITES>){  
 chomp $line;  
 my @header = split /\t/, $line;  
 my $start=$header[1]+$cumlen{$header[0]};  
 my $end=$header[2]+$cumlen{$header[0]};  
 print "Map\t$start\t$header[3]\n";  
 print "Map\t$end\t$header[3]\n";  
 print "Map\tNA\tNA\n";  
 }  
 close MAPSITES;  
The figure below shows the dotplot of the Optical map created using above scripts with a search radius of 200 base pairs. Increasing the search radius saturates the plot while allowing for more distant hits.

No comments: