DANIEL H. WAGNER ASSOCIATES, INC

A Leader in Applying Mathematics and Computer Science to Industry

 

Operations Research - Mathematics - Software Development

Home

About Us

Technology

Projects

Products

Careers

Contact Us

Search

You are at: Wagner Home > Technologies > Biotech > Mixture Detection

Statistical Pattern Recognition of DNA Mixtures

Description

Heteroplasmy occurs when a mutation in a mitochondrial DNA (mtDNA) molecule within a cell leads to a mixture of mutant and normal mtDNA molecules. Heterozygosity occurs when a diploid organism or cell has inherited different alleles at a particular locus from each parent. Both cases result in mixtures of DNA sequences that have important applications in forensics, pathology, and evolutionary genetics. The automated detection of DNA mixtures is an important tool in harnessing these genetic features.

The Problem

The detection by hand of mixtures in DNA sequence data from the appearance of processed electrophoretic trace data is a time consuming process. Our research has focused on applying our already successful pattern recognition techniques from automated trace quality work to the automatic detection of instances of DNA mixtures.

Training Data

Wagner Associates was provided with specially formulated training data by Mark Wilson, Program Manager of mtDNA analysis at the DNA Unit II of the FBI Laboratory. Mark prepared mixtures of pairs of known mtDNA sequences that differed in 1-10 nucleotide locations, at controlled ratios between 1-1 and 1-20. The differences between the mixed sequences consisted of 'substitutions' of one basecall for another. Consequently, locations at which the underlying sequences differed had the appearance of heteroplasmies or heterozygotes. The problem of insertion/deletion mutations was considered sufficiently different and simpler in character to warrant focusing solely on substitution mutations.

The training data set contained forward and reverse sequences for each mixed sample. We considered each base call -- sequenced in the forward and reverse direction -- together with associated local trace data to be a single training element. In all, we had approximately 100,000 separate training elements. Of these, 217 were mixed calls, while the others were pure calls. The pictures below show examples of two training elements, one a 'mixed' and the other a 'pure' basecall.

Mixed Pure

Methodology

Wagner Associates considered hundreds of numerical features associated to each training element and evaluated them for their ability to distinguish mixed basecalls from pure calls. Some features were natural contenders, such as the ratio of the highest and second highest traces at the basecall location. Others were suggested by DNA analysts with years of experience in distinguishing true mixture instances from pure basecalls with a large level of background noise.

1:8 Mixture Background Noise

With the most important features identified, we studied the resulting feature space whose coordinates consist of the individual features. We used statistical classification algorithms to partition the feature space into subsets and assigned a probability (of mixture) to each of the subsets in accordance with the historical (training) data. Each probability (P) was then converted into a log scaled integer score (S), via the relation

S = -10 * log10(P)

Results

The resulting model was tested by resubstituting the training data into the scoring model. The following table presents the resulting scores, the corresponding predicted proportion of mixture calls for that score, and the observed proportion of mixture calls for that score. The results show good agreement between predictions and observations, especially given the small amount of data available for training.

Score Predicted Probability
of Mixture
Observed Proportion
of Mixtures
0 1.00000 0.44134
11 0.06723 0.03101
18 0.01569 0.04762
19 0.01100 0.03101
21 0.00751 0.01639
23 0.00500 0.00285
24 0.00414 0.00266
25 0.00311 0.00178
27 0.00201 0.01961
29 0.00107 0.00133
34 0.00040 0.00000
35 0.00028 0.00018
36 0.00024 0.00000
39 0.00013 0.00003

Contact Us

We are actively seeking clients, commercialization partners, and sources of additional training data for our work on mixture detection.

Please contact atqa@pa.wagner.com for further information. Go here for other contact options.


 

Home | Contact Us | Site Index | Career Opportunities

Technology | Projects | Products | Locations | Legal Notices | Search

© 2005 Daniel H. Wagner Associates, Inc.  - All rights reserved.