Wednesday, April 27, 2016

Visualizing isoforms of a gene as a undirected graph using sna package

The sna (social network analysis) package in R provides an easy to use interface for handling network data structures. Apart from numerous statistics that can be calculated on the graph, it is possible to visualize the graph using the "gplot" command.

Here, we use a perl script to convert the information regarding exon positions along the gene and transcript structure into the nos format. We are then able to see that the gene glutamate-cysteine ligase, catalytic subunit (GCLC) is actually made up of 5 different components.

Perl script to write the gene in nos format: (GeneRanked_exons contains list of exons with their positional rank in the gene. test.exon.order contains the list of exons in each transcript ordered by the exon positional rank in the transcript)
 #!/usr/bin/perl  
 #  
 ##perl printNetwork.pl GeneRanked_exons.txt test.exon.order > ENSG00000001084_graph.txt  
 #  
 my %exons=();  
 open(FILE, $ARGV[0]);  
 while (my $row = <FILE>) {  
  chomp $row;  
  @values=split(' ',$row);  
  $exons{$values[0]}=$values[2];  
 }  
 close FILE;  
 my %matrix;  
 my $maxexon=0;  
 open(FILE2, $ARGV[1]);  
 my $row = <FILE2>;  
 chomp $row;@values=split('\t',$row);$exoncount1=$exons{$values[0]};$trans1=$values[2];$trancount1=$values[3];  
 while (my $row = <FILE2>) {  
  if($exoncount1>$maxexon){$maxexon=$exoncount1;}  
  chomp $row;@values=split('\t',$row);$exoncount2=$exons{$values[0]};$trans2=$values[2];$trancount2=$values[3];  
  #print "$exoncount1\t$trans1\t$exoncount2\t$trans2\n";  
  if($exoncount2>$maxexon){$maxexon=$exoncount2;}  
  $ftrancount2=$trancount2-1;  
  #print "$exoncount1\t$trans1\t$exoncount2\t$trans2\t$trancount1\t$trancount2\n";  
  if(($trans1 =~ m/^$trans2$/i)&&($trancount1==$ftrancount2)){ $matrix{$exoncount1}{$exoncount2}=1;}  
  $exoncount1=$exoncount2;$trans1=$trans2;$trancount1=$trancount2;  
 }  
 print "1\n";  
 print "$maxexon $maxexon\n";  
 for ($i=0;$i<$maxexon;$i++){  
 print "0 0 ";  
 }  
 print "\n";  
 for ($k=1;$k<=$maxexon;$k++){  
      for ($i=1;$i<=$maxexon;$i++){  
      if(exists $matrix{$k}{$i}){$j=$matrix{$k}{$i};}  
  else{$j=0;}  
      print "$j ";  
      }  
      print "\n";  
 }  
 #     #1  
 #     #4 4  
 #     #0 0 0 0 0 0 0 0  
 #     #0 1 0 0  
 #     #0 0 1 1  
 #     #0 1 0 0  
 #     #0 0 1 0  
 #  
 #  
The output from the printNetwork command can be read into R using the read.nos function of the sna package. The graph can then be visualized using below lines in R.

 file1<-"ENSG00000001084_graph.txt"  
 library(sna)  
 read.nos(file=file1)->g  
 jpeg(paste(file1,".jpeg",sep=""))  
 gplot(g,gmode="graph",displaylabels=TRUE,main=file1)  
 dev.off()  
This produces a graph that looks like this: (each red dot is an exon with the number beside it being its positional rank).
While the actual gene on ensemble looks like this: