Tuesday, February 22, 2011

convert Fasta files into TIGR .contig file

During the process of converting Fasta files with gaps to AFG format, one step required generation of TIGR .contig format. Given below is a simple script to generate a dummy .contig file which has just one read for each contig which is exactly the same as the contig. This is done due to the lack of read tracking in SOAP, SSPACE etc. and also to avoid the huge file size if all the reads are retained.

 #!/usr/bin/perl  
 open FASTA, "< $ARGV[0]" or die "Can't open $ARGV[0] ($!)\n";  
 while($z=<FASTA>){  
  if($z=~ />/)  
  {  
  chomp $z;  
  $header=(split(/>/,$z))[1];  
  $readid=(split(/_/,$header))[1];  
  $seq=;  
  chomp $seq;  
  $seqlen=length $seq;  
  print "##$header 1 $seqlen bases, 00000000 checksum.\n";  
  print "$seq\n";  
  print "#$readid(0) [] $seqlen bases, 00000000 checksum. {$seqlen 0} <1>\n";  
  print "$seq\n";  
  }  
 }  

save it as fasta2contig.pl and use as "perl fasta2contig.pl contig.fa"