Wednesday, January 22, 2014

Nth order Markov chain in perl given stationary distribution and transition probability matrix

A Markhov chain can be simulated given a transition probability matrix.

 #!/usr/bin/perl  
 use warnings;  
 use strict;  
 open MODELS, $ARGV[0] or die $!;  
 my $cyclestorun=20;  
 my %model=();  
 my $line = <MODELS>;chomp $line;  
 my @headers=split(/\t/,$line);  
 my @values="";  
 my @stationVals;  
 my @stationDist;  
 my $check=0;  
 my $rnd=0;  
 my $test=0;  
 my $index=0;  
 my $startval="";  
 my $preval="";  
 my $nextval="";  
 my $i=0;  
 my $j=0;  
 my $stat=0;  
 my $presub="";  
 while($line = <MODELS>){  
      chomp $line;@values=split(/\t/,$line);$check=0;  
      push(@stationVals,$values[0]);  
      push(@stationDist,$values[1]);  
      for($i=2;$i<scalar @headers;$i++){  
      $check+=$values[$i];  
      if($check>1){print "Error in Transition probability matrix in line: $line\n";exit;}  
      $model{$values[0]}{$headers[$i]}=$values[$i];  
 #     print "$values[0]\t$headers[$i]\t$model{$values[0]}{$headers[$i]}\n";  
      }  
 }#end of while loop  
 close MODELS;  
      print "\t\t\tInitialising stationary distribution\n\n\n";  
 $check=0;  
 foreach $stat(@stationDist){  
      $check+=$stat;  
      if($check>1){print "Error in Stationary distribution.Probability greater than 1!\n";exit;}  
 }  
 $rnd=rand(1);  
 $test=0;  
 foreach $stat(@stationDist){  
      $test+=$stat;  
      if($rnd<$test){  
      $startval=$stationVals[$index];  
      #exit the foreach loop  
      $test=-100;  
      }  
 $index++;  
 }  
      print "\t\tInitial step\t$rnd\t$startval\n\n";  
 $preval=$startval; $test=0;  
 for($j=1;$j<=$cyclestorun;$j++){  
      $test=0;$rnd=rand(1);  
      for($i=2;$i<scalar @headers;$i++){  
      $test+=$model{$preval}{$headers[$i]};  
           if($rnd<$test){  
           $nextval=$headers[$i];  
           #exit the foreach loop  
           $test=-100;  
           }  
      }  
      if((length $preval)>1){$presub=substr $preval,1,1;}  
      $preval=$presub . $nextval;  
      print "\t\t$j\t$rnd\t$preval\n";  
 }#end of cyclestorun loop  

Example cases:

First order Markhov chain:
The "stat" column corresponds to the stationary matrix.

Model:
      stat     A     G     T  
 A     0.1     0.1     0.5     0.4  
 G     0.5     0.9     0.1     0  
 T     0.4     0.3     0.3     0.4  

Output:

 perl markhov.pl Markhov1.model  
                Initialising stationary distribution  
           Initial step     0.566160679402859     G  
           1     0.726538222140761     A  
           2     0.307280103481514     G  
           3     0.892182052225458     A  
           4     0.336793610721617     G  
           5     0.205083725028075     A  
           6     0.523360180106774     G  
           7     0.376126474509043     A  
           8     0.385466656638901     G  
           9     0.357110639627308     A  
           10     0.940985077542603     T  
           11     0.567528252945955     G  
           12     0.215130112396754     A  
           13     0.737550170708133     T  
           14     0.372513088005231     G  
           15     0.87686981502349     A  
           16     0.534856061888437     G  
           17     0.94194553431501     G  
           18     0.800215938891096     A  
           19     0.990931721477871     T  
           20     0.371922711443307     G  

Second order Markhov chain:

Model:
      stat     A     G     T  
 AA     0.1     0.1     0.5     0.4  
 AG     0.05     0.9     0.1     0  
 AT     0.04     0.3     0.3     0.4  
 GA     0.2     0.1     0.5     0.4  
 GG     0.1     0.9     0.1     0  
 GT     0.1     0.3     0.3     0.4  
 TA     0.1     0.1     0.5     0.4  
 TG     0.1     0.9     0.1     0  
 TT     0.2     0.3     0.3     0.4  

Output:

 perl markhov.pl Markhov2.model  
                Initialising stationary distribution  
           Initial step     0.625368699737923     TA  
           1     0.590942442973272     AG  
           2     0.0483144526753101     GA  
           3     0.0103876021193585     AA  
           4     0.367650618439331     AG  
           5     0.973362473523579     GG  
           6     0.310508404063821     GA  
           7     0.890260746277896     AT  
           8     0.4738624084188     TG  
           9     0.158245684870735     GA  
           10     0.295448727223441     AG  
           11     0.33900585792933     GA  
           12     0.988197918079994     AT  
           13     0.366480732651112     TG  
           14     0.229562890090616     GA  
           15     0.543711924314373     AG  
           16     0.114082525688733     GA  
           17     0.850092575768059     AT  
           18     0.546652122930436     TG  
           19     0.715004225102355     GA  
           20     0.20491958253287     AG  

Any number of states can be added, as well as any order can be simulated.

Zero-order:

Model:
      stat     A     G     T  
 A     0.33     0.33     0.33     0.33  
 G     0.33     0.33     0.33     0.33  
 T     0.33     0.33     0.33     0.33  

Output:
 perl markhov.pl Markhov0.model  
                Initialising stationary distribution  
           Initial step     0.804352114516409     T  
           1     0.967022104046436     T  
           2     0.146048431090218     A  
           3     0.168446080849158     A  
           4     0.806730437919288     T  
           5     0.0541887571889532     A  
           6     0.90109841189982     T  
           7     0.815255933249396     T  
           8     0.647698135106484     G  
           9     0.93008693266961     T  
           10     0.721468943653047     T  
           11     0.311067960681136     A  
           12     0.231977046730197     A  
           13     0.251340034965111     A  
           14     0.140686836632536     A  
           15     0.0760768098699529     A  
           16     0.182739903656344     A  
           17     0.496555550139448     G  
           18     0.219248000363631     A  
           19     0.163342643602853     A  
           20     0.994710285749438     A  
And now to use it in a real world application....any suggestions?

No comments: