|
You are at: Wagner
Home > Technologies > Biotech
> ATQA > ATQA Performance Results
Results of Testing
Daniel H. Wagner Associates subjected the ATQA scoring model to extensive
testing and validation. The model was tested to assure that it
was properly calibrated and that the assigned scores corresponded
to comparable error rates. We considered classifiers based on
the scorer and evaluated them by the number of misclassifications
that occurred. In both cases the model performed exceptionally
well. The section below summarize the calibration portion of the
testing phase.
Calibration
Testing Results
The ATQA scoring model was tested on an original reserved test
set of ~100,000 elements. These elements were generated through
the same process as the larger training set, but were held back
and not used in the training, so as to provide independent confirmation
of the model's validity.
The elements of the test set were then run through the scoring
model, and statistics pertaining to each of the three quality
scores were compiled. In particular, for each of the three quality
scores, the testing elements were grouped by the score assigned
by the model, and the error rate implied by the quality score
was compared to the actual rate of incidence of the respective
error type. The results of these tests are summarized in the following
3 plots, where actual error rates are plotted against the model
predicted rates.
Actual Substitution Rates
vs.
Substitution scores |
 |
Actual Insertion Rates
vs.
Insertion Scores |
 |
Observed Deletion Rates
vs.
Deletion Scores |
 |
Note that in these graphs, the asterisks indicate the observed
error rates. The distance of these asterisks from the diagonal
line indicates the amount of deviation from the predicted error
rates. The vertical error bars indicate the range of true error
rates that would be consistent with the observed error rate, and
takes into account the size of the sample. The horizontal line
in each graph indicates the overall error rate for the testing
data. A scoring model which randomly assigned scores to basecalls
would result in a graph with asterisks randomly distributed about
this line.
Return to Automated Trace Quality
Assessment tool main page.
|