Monday, June 10, 2013

Randomise order of lines in a file

Different programming languages are good at different things. R has many powerful statistical functions, while perl is good at data handling.

N random numbers from a certain range of numbers without re-sampling can be easily done in R with the "sample" function. To do the same thing in Perl, looping has (or atleast some form of iteration) to be used along with storing the results and checking to avoid re-sampling. 

This Rscript can be run using the below line

 Rscript sampleit.r $linecount $iterationcount  

Once, the file with the new order of lines has been generated, it can be used by the perl script to write the file in the new order. We also keep the first two columns of the file unchanged and just randomise the remaining parts of the file.

 open RANDS, $ARGV[1] or die $!;  
 my %rhash = ();  
 my $count=1;  
#read in the file created by the R script in previous step
 while($lines = <RANDS>){  
 chomp $lines;  
 close RANDS;  
#read the file that needs to be randomised and store it in hash with new order
 open STATS, $ARGV[0] or die $!;  
 my $hash = {CHR =>my $genename,POS =>my $pid,RESTATS =>my $pco1};  
 while($line = <STATS>){  
 chomp $line;  
 @tabs=split(/[ \t]+/,$line);  
 $line =~ m/\w*\t\w*\t(.*)$/;  
 }#end of file while loop  
#check if number of lines match in old and new file
 if($mycount!=$count){print "mismatch in counts\n";}  
#print the file out in new order
 foreach $contigs (sort { $a <=> $b } keys %hash) {   
 print "$hash{$contigs}{CHR}\t$hash{$contigs}{POS}\t$hash{$contigs}{RESTATS}\n";   

This perl script just reads in the input file, stores it in a hash with new line order and then prints it out.

No comments: