One possible solution to Analyzing in vitro toxicity and expression data perl project. Example table that is used as input to the perl script is given in below table.
Actual perl code:
Just run the perl script with the filename of the table as first argument and it will print "which chemical compound and at what concentration causes the highest fold change of transcription in each gene at each timepoint".
The same can be done with one line in R using :
"source_name_sid" | "casrn" | "chemical_name" | "assay_component_name" | "conc" | "value" | "value_type" | "plate_id" | "rep" | "row" | "col" | "target" | "time_hr" | "reference" | "media_reference" | "donor" |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
"DSSTOX_40294" | "135158-54-2" | "Acibenzolar-S-Methyl" | "CLZD_ABCB1_6" | 0.004 | 1.08 | "fold_change" | 113 | 10 | "E" | 5 | "ABCB1" | 6 | 300.5345.75 | 778 | |
"DSSTOX_40294" | "135158-54-2" | "Acibenzolar-S-Methyl" | "CLZD_ABCB1_6" | 0.004 | 0.96 | "fold_change" | 113 | 10 | "F" | 5 | "ABCB1" | 6 | 300.5345.75 | 778 | |
"DSSTOX_40294" | "135158-54-2" | "Acibenzolar-S-Methyl" | "CLZD_ABCB1_6" | 0.004 | 0.94 | "fold_change" | 113 | 10 | "G" | 5 | "ABCB1" | 6 | 300.5345.75 | 778 | |
"DSSTOX_40294" | "135158-54-2" | "Acibenzolar-S-Methyl" | "CLZD_ABCB1_6" | 0.004 | 0.93 | "fold_change" | 113 | 10 | "H" | 5 | "ABCB1" | 6 | 300.5345.75 | 778 | |
"DSSTOX_40294" | "135158-54-2" | "Acibenzolar-S-Methyl" | "CLZD_ABCB1_6" | 0.04 | 0.94 | "fold_change" | 113 | 8 | "G" | 4 | "ABCB1" | 6 | 300.5345.75 | 778 | |
"DSSTOX_40294" | "135158-54-2" | "Acibenzolar-S-Methyl" | "CLZD_ABCB1_6" | 0.04 | 1.13 | "fold_change" | 113 | 8 | "E" | 4 | "ABCB1" | 6 | 300.5345.75 | 778 | |
"DSSTOX_40294" | "135158-54-2" | "Acibenzolar-S-Methyl" | "CLZD_ABCB1_6" | 0.04 | 0.96 | "fold_change" | 113 | 8 | "H" | 4 | "ABCB1" | 6 | 300.5345.75 | 778 | |
"DSSTOX_40294" | "135158-54-2" | "Acibenzolar-S-Methyl" | "CLZD_ABCB1_6" | 0.04 | 1.07 | "fold_change" | 113 | 8 | "F" | 4 | "ABCB1" | 6 | 300.5345.75 | 778 | |
"DSSTOX_40294" | "135158-54-2" | "Acibenzolar-S-Methyl" | "CLZD_ABCB1_6" | 0.4 | 1.02 | "fold_change" | 113 | 6 | "F" | 3 | "ABCB1" | 6 | 300.5345.75 | 778 |
Actual perl code:
#!/usr/bin/perl
open TOXTABLE, $ARGV[0] or die $!;
%gcct=();
while($line = <TOXTABLE>){
chomp $line;
@tabs=split(/\t/,$line);
$key=$tabs[2] . "_" . $tabs[4] . "_" . $tabs[11] . "_" . $tabs[12];
$foldchange=$tabs[5];
if(exists $gcct{$key}){if($gcct{$key}<$foldchange){$gcct{$key}=$foldchange;}}
else{$gcct{$key}=$foldchange;}
}#end of file while loop
print "chemical_name\tconctarget\ttime_hr\tvalue\n";
foreach $mykey(sort keys %gcct){
@mytabs=split(/\_/,$mykey);
print "$mytabs[0]\t$mytabs[1]\t$mytabs[2]\t$mytabs[3]\t$gcct{$mykey}\n";
}
Just run the perl script with the filename of the table as first argument and it will print "which chemical compound and at what concentration causes the highest fold change of transcription in each gene at each timepoint".
The same can be done with one line in R using :
aggregate(M$value,by=list(M$chemical_name,M$conc,M$target,M$time_hr),max)->N
No comments:
Post a Comment