The Drosophila transcriptome has been analysed in different conditions to understand its diversity. Here, we re-analyse the supplementary materials(only from supplementary table-3) and find "interesting" patterns in splicing diversity.
read.table(file="supp3",header=T)->S #calculate percentage of genes with more than 1 isoform length(S$transcripts[S$transcripts>1])*100/ length(S$transcripts) #calculate percentage of genes with more than 1 protein length(S$proteins[S$proteins>1])*100/ length(S$proteins) summary(S)
Over half of the 17,564 annotated genes (10,136; 57%) have more than one transcript isoform. However, only 37% (6,584) of the genes have more than one protein, suggesting that most of the transcript diversity is outside the protein coding regions. While the genes have on average 17.39 transcripts, only 2.59 proteins are found on average per gene.
This contrast between the number of transcripts and proteins is very pronounced in genes with very large number of transcripts. Infact, the gene "gish" which has 18,972 transcripts has only 142 proteins. See below plot that shows the number of records with the words "exon" and "CDS" from the "gish" gene in the new annotation.