Repeat Experiments

RNA Secondary Structure Prediction

make delta

Run the command make delta. This command starts an infinite loop. Each run of the loop uses the folding simulation program to predict the secondary structures for all 252 sequences in the data set, and collects the result files PKB?????.delta.hx into a zip file mmddHHMM.delta.zip, where the eight numbers mmddHHMM record the month, day, hour, and minute of the starting time of the run. Each run take 20-30 hours to complete. During each run, hundreds of temporary files PKB?????.rna, PKB?????.seq, PKB?????.db, PKB?????.pkb.hx, and PKB?????.delta.bp are generated.

make stop

After getting a sufficient number of the zip files mmddHHMM.delta.zip, you need to stop the experiment. Do NOT press Ctrl-C. Instead, run the command make stop in the same directory from another shell. This command uses the program killall to terminate all processes of the infinite loop.

make txt

Next run the command make txt. This command generates a text file mmddHHMM.summary.txt for each zip file mmddHHMM.delta.zip. The creation of each text file takes about two minutes. Each text file mmddHHMM.summary.txt contains a summary of prediction results for each sequence as the following:

PKB00001  29   9   5  0.62   9   0   0  1.00  1.00  1.00 DeltaIS
PKB00001  29   9   5  0.62   6   3   0  0.67  1.00  0.67 HotKnot
AGGGGGGACUUAGCGCCCCCCAAACCGUA
:[[[[[[:::::(((]]]]]]::::))):
 ______        ______
       __                 __
          __               __
             __          __
>
 ______        ______
            ___          ___
>
 ______        ______
>
 ______        ______
            ___          ___

The example above is the summary for the sequence PKB00001. The first two lines contain the following: sequence ID, sequence length, number of base pairs, maximum gap (maximum number of consecutive unpaired bases), density (fraction) of paired bases, true positives, false negatives, false positives, sensitivity, selectivity, accuracy, and DeltaIS or HotKnot. The third line contains the RNA sequence. The fourth line contains the known secondary structure from PseudoBase in dot-bracket format. The remaining lines are separated by > into four groups of helices in 2-interval format: the first group and the second group are the predictions of DeltaIS before and after the selection; the third group is the prediction of HotKnot; the fourth group is the known secondary structure from PseudoBase.

make sts and scatter.pdf

Next run the command make -s sts. This command calculates the statistics of the overall performances of DeltaIS and HotKnot. The output contains seven lines as the following:

0.802037 0.783753 0.656712
0.716855 0.784694 0.59903
0.7911  0.0082  0.7739  0.0083  0.6426  0.0109
0.7169  0.0000  0.7847  0.0000  0.5990  0.0000
      82
      47
      36

The 1st line is for DeltaIS: the best of 66 runs has sensitivity 80.20% selectivity 78.37% accuracy 65.67%. The 2nd line is for HotKnot. The 3rd line is for DeltaIS: the average and standard deviation of 66 runs has sensitivity 79.11% ± 0.82% selectivity 77.39% ± 0.83% accuracy 64.26% ± 1.09%. The 4th line is for HotKnot. The numbers in the 5th and 6th lines say that DeltaIS predicted 82 perfectly in at least one run and 47 perfectly in all runs. The number in the 7th line says that HotKnot predicted 36 secondary structures perfectly.

Finally, run the command make scatter.pdf. This command generates a scatter plot of the prediction accuracies of DeltaIS and HotKnot on individual sequences.

RNA Tertiary Structure Reconstruction

Run the command make reconstruct. This commands takes about 15 minutes to complete and generates a file reconstruct.txt. Each line of the file has four fields: sequence ID, sequence length, 1 or 0 indicating whether the reconstruction was successful, and number of iterations actually used for the reconstruction.

Results of Our Experiments

_