Sunday, May 19, 2013

Toxicity and expression data project

One possible solution to Analyzing in vitro toxicity and expression data perl project. Example table that is used as input to the perl script is given in below table.


"source_name_sid""casrn""chemical_name""assay_component_name""conc""value""value_type""plate_id""rep""row""col""target""time_hr""reference""media_reference""donor"
"DSSTOX_40294""135158-54-2""Acibenzolar-S-Methyl""CLZD_ABCB1_6"0.0041.08"fold_change"11310"E"5"ABCB1"6300.5345.75778
"DSSTOX_40294""135158-54-2""Acibenzolar-S-Methyl""CLZD_ABCB1_6"0.0040.96"fold_change"11310"F"5"ABCB1"6300.5345.75778
"DSSTOX_40294""135158-54-2""Acibenzolar-S-Methyl""CLZD_ABCB1_6"0.0040.94"fold_change"11310"G"5"ABCB1"6300.5345.75778
"DSSTOX_40294""135158-54-2""Acibenzolar-S-Methyl""CLZD_ABCB1_6"0.0040.93"fold_change"11310"H"5"ABCB1"6300.5345.75778
"DSSTOX_40294""135158-54-2""Acibenzolar-S-Methyl""CLZD_ABCB1_6"0.040.94"fold_change"1138"G"4"ABCB1"6300.5345.75778
"DSSTOX_40294""135158-54-2""Acibenzolar-S-Methyl""CLZD_ABCB1_6"0.041.13"fold_change"1138"E"4"ABCB1"6300.5345.75778
"DSSTOX_40294""135158-54-2""Acibenzolar-S-Methyl""CLZD_ABCB1_6"0.040.96"fold_change"1138"H"4"ABCB1"6300.5345.75778
"DSSTOX_40294""135158-54-2""Acibenzolar-S-Methyl""CLZD_ABCB1_6"0.041.07"fold_change"1138"F"4"ABCB1"6300.5345.75778
"DSSTOX_40294""135158-54-2""Acibenzolar-S-Methyl""CLZD_ABCB1_6"0.41.02"fold_change"1136"F"3"ABCB1"6300.5345.75778

Actual perl code:
#!/usr/bin/perl

open TOXTABLE, $ARGV[0] or die $!;

%gcct=();

while($line = <TOXTABLE>){
chomp $line;
@tabs=split(/\t/,$line);

$key=$tabs[2] . "_" . $tabs[4] . "_" . $tabs[11] . "_" . $tabs[12];
$foldchange=$tabs[5];

if(exists $gcct{$key}){if($gcct{$key}<$foldchange){$gcct{$key}=$foldchange;}}
else{$gcct{$key}=$foldchange;}

}#end of file while loop

print "chemical_name\tconctarget\ttime_hr\tvalue\n";

foreach $mykey(sort keys %gcct){
@mytabs=split(/\_/,$mykey);
print "$mytabs[0]\t$mytabs[1]\t$mytabs[2]\t$mytabs[3]\t$gcct{$mykey}\n";
}


Just run the perl script with the filename of the table as first argument and it will print "which chemical compound and at what concentration causes the highest fold change of transcription in each gene at each timepoint".

The same can be done with one line in R using :



 aggregate(M$value,by=list(M$chemical_name,M$conc,M$target,M$time_hr),max)->N  

No comments: