During the process of converting Fasta files with gaps to AFG format, one step required generation of TIGR .contig format. Given below is a simple script to generate a dummy .contig file which has just one read for each contig which is exactly the same as the contig. This is done due to the lack of read tracking in SOAP, SSPACE etc. and also to avoid the huge file size if all the reads are retained.
save it as fasta2contig.pl and use as "perl fasta2contig.pl contig.fa"
#!/usr/bin/perl
open FASTA, "< $ARGV[0]" or die "Can't open $ARGV[0] ($!)\n";
while($z=<FASTA>){
if($z=~ />/)
{
chomp $z;
$header=(split(/>/,$z))[1];
$readid=(split(/_/,$header))[1];
$seq=;
chomp $seq;
$seqlen=length $seq;
print "##$header 1 $seqlen bases, 00000000 checksum.\n";
print "$seq\n";
print "#$readid(0) [] $seqlen bases, 00000000 checksum. {$seqlen 0} <1>\n";
print "$seq\n";
}
}
save it as fasta2contig.pl and use as "perl fasta2contig.pl contig.fa"
No comments:
Post a Comment