A previous attempt at merging two kmer count hashes was neither memory efficient nor capable of merging multiple kmer count hashes. Here, we use the age old trick of sorting to write a more memory efficient script that can handle as N number of hashes.
cat *_"$kmer"_counts.fa|sort > sorted_"$kmer"_all.fa
The above command will concatenate all the hashes and sort it. This sorted file can then be used by the below perl script to merge the hashes. Since, all kmers that need to be merged are in adjacent lines, the memory needed for merging is drastically reduced compared to the previous script.
#!/usr/bin/perl
use warnings;
# Input parameters
open FASTA1, $ARGV[0] or die $!;
my $previous="Kmer";
my $previousCount="Kmercount";
my @jelly;
while($line = <FASTA1>){
chomp $line;
@jelly=split(/\s+/,$line);
if($previous=~/$jelly[0]/){
$previousCount=$previousCount+$jelly[1];
}
else{
print "$previous\t$previousCount\n";
$previous=$jelly[0];$previousCount=$jelly[1];
}
}
#printing last line if it needed merging
if($previous=~/$jelly[0]/){
print "$previous\t$previousCount\n";
}
close FASTA1;
No comments:
Post a Comment