Tuesday, January 18, 2011

Blat instead of Nucmer in Minimus2

Minimus2 is a good utility to merge assemblies based on the AMOS architecture. However, it has a limitation on the size of the input datasets. This is due to the Nucmer program is its pipline. Apart from Nucmer other steps are able to handle much bigger chunks of data. So instead of using Nucmer i tried using Blat. Although this gave me assemblies, they were not always correct and complete.

May be someday somebody will use it.In the meantime i try to use Nucmer iteratively on smaller chunks and then merge the output before the next step. This is possible as Nucmer is run with the option -maxmatch in Minimus, making it independent of the other sequences in the dataset.

1)The changes to Minimu2.acf in amos/src/Pipeline/:

BINDIR = /usr/local/bin
NUCMER = /usr/local/nucmer
DELTAFILTER = /usr/local/delta-filter
SHOWCOORDS = /usr/local/show-coords
BLAT = blat
BTON = perl psltoNucmer.pl

#-----------------------------
-------------------------------------------------#

## Building AMOS bank & Dumping reads
10: rm -fr $(BANK)
11: $(BINDIR)/bank-transact -c -z -b $(BANK) -m $(TGT)
12: $(BINDIR)/dumpreads $(BANK) -M $(REFCOUNT) > $(REFSEQ)
13: $(BINDIR)/dumpreads $(BANK) -m $(REFCOUNT) > $(QRYSEQ)

## Getting overlaps
20: $(BLAT) $(REFSEQ) $(QRYSEQ) $(BOUT) -fastMap -noHead -minIdentity=98
21: $(BTON) $(BOUT) | $(BINDIR)/nucmerAnnotate | egrep 'BEGIN|END|CONTAIN|IDENTITY' > $(COORDS)
22: $(BINDIR)/nucmer2ovl -ignore $(MAXTRIM) -tab $(COORDS) | $(BINDIR)/sort2 > $(OVLTAB)


2) The code for psltoNucmer.pl is given below:

#!/usr/bin/perl

open FILE2, "<",$ARGV[0] or die $!;

while($z = ){
@values=split(/\t/,$z);
print $values[15]."\t".$values[16]."\t|\t".$values[11]."\t".$values[12]."\t|\t".abs($values[16]-$values[15])."\t".abs($values[12]-$values[11])."\t|\t".(sprintf "%.2f",(($values[0]+$values[2]+$values[3])*100)/abs($values[12]-$values[11]))."\t|\t".$values[14]."\t".$values[10]."\t|\t".(sprintf "%.2f",(abs($values[16]-$values[15])*100/$values[14]))."\t".(sprintf "%.2f",(abs($values[12]-$values[11])*100/$values[10]))."\t|\t".$values[13]."\t".$values[9]."\n";
}#end of File while loop

3) blat output format details can be found here : http://genome.ucsc.edu/goldenPath/help/blatSpec.html

No comments: