Many programs like GATK require the fasta files to be sorted before use. Here is a rather simple script for the job:
However, you can find more elegant solutions that use Bioperl at Wolf/Takebayashi lab.
#!/usr/bin/perl
open FASTA, $ARGV[0] or die $!;
my $temp="";
my $seqs = {SEQ =>my $fheader};
my $sortemp="";
while($line = <FASTA> ){
if($line=~ /^>/){
if($header){$seqs{$header}{SEQ}=$temp;}
chomp $line;
$header="";
$line =~ s/[\s]/_/g;
$header=$line;
$temp="";
}
else{$line =~ s/[\n\t\f\r_0-9\s]//g;$temp .= $line;}
}#end of while loop
if($header){$seqs{$header}{SEQ}=$temp;}
close FASTA;
foreach $sortemp (sort keys %seqs) {
print "$sortemp\n";
print "$seqs{$sortemp}{SEQ}\n";
}
However, you can find more elegant solutions that use Bioperl at Wolf/Takebayashi lab.
No comments:
Post a Comment