DANIEL H. WAGNER ASSOCIATES, INC

A Leader in Applying Mathematics and Computer Science to Industry

 

Operations Research - Mathematics - Software Development

Home

About Us

Technology

Projects

Products

Careers

Contact Us

Search

You are at: Wagner Home > Technologies > Biotech > ATQA > ATQA Performance Results

Results of Testing

Daniel H. Wagner Associates subjected the ATQA scoring model to extensive testing and validation. The model was tested to assure that it was properly calibrated and that the assigned scores corresponded to comparable error rates. We considered classifiers based on the scorer and evaluated them by the number of misclassifications that occurred. In both cases the model performed exceptionally well. The section below summarize the calibration portion of the testing phase.

Calibration Testing Results

The ATQA scoring model was tested on an original reserved test set of ~100,000 elements. These elements were generated through the same process as the larger training set, but were held back and not used in the training, so as to provide independent confirmation of the model's validity.

The elements of the test set were then run through the scoring model, and statistics pertaining to each of the three quality scores were compiled. In particular, for each of the three quality scores, the testing elements were grouped by the score assigned by the model, and the error rate implied by the quality score was compared to the actual rate of incidence of the respective error type. The results of these tests are summarized in the following 3 plots, where actual error rates are plotted against the model predicted rates.

Actual Substitution Rates
vs.
Substitution scores
Actual Insertion Rates
vs.
Insertion Scores
Observed Deletion Rates
vs.
Deletion Scores

Note that in these graphs, the asterisks indicate the observed error rates. The distance of these asterisks from the diagonal line indicates the amount of deviation from the predicted error rates. The vertical error bars indicate the range of true error rates that would be consistent with the observed error rate, and takes into account the size of the sample. The horizontal line in each graph indicates the overall error rate for the testing data. A scoring model which randomly assigned scores to basecalls would result in a graph with asterisks randomly distributed about this line.

Return to Automated Trace Quality Assessment tool main page.


 

Home | Contact Us | Site Index | Career Opportunities

Technology | Projects | Products | Locations | Legal Notices | Search

© 2005 Daniel H. Wagner Associates, Inc.  - All rights reserved.