|
You are at: Wagner
Home > Technologies > Biotech
> Mixture Detection
Statistical Pattern Recognition
of DNA Mixtures
Description
Heteroplasmy occurs when a mutation in a mitochondrial DNA
(mtDNA) molecule within a cell leads to a mixture of mutant and
normal mtDNA molecules. Heterozygosity occurs when a diploid organism
or cell has inherited different alleles at a particular locus
from each parent. Both cases result in mixtures of DNA sequences
that have important applications in
forensics, pathology, and evolutionary genetics.
The automated detection of DNA mixtures is an important tool in
harnessing these genetic features.
The
Problem
The detection by hand of mixtures in DNA sequence data from
the appearance of processed electrophoretic trace data is a time
consuming process. Our research has focused on applying our already
successful pattern recognition techniques from automated trace
quality work to the automatic detection of instances of DNA mixtures.
Training
Data
Wagner Associates was provided with specially formulated training
data by Mark Wilson, Program Manager of mtDNA analysis at the
DNA Unit II of the FBI Laboratory. Mark prepared mixtures of pairs
of known mtDNA sequences that differed in 1-10 nucleotide locations,
at controlled ratios between 1-1 and 1-20. The differences between
the mixed sequences consisted of 'substitutions' of one basecall
for another. Consequently, locations at which the underlying sequences
differed had the appearance of heteroplasmies or heterozygotes.
The problem of insertion/deletion mutations was considered sufficiently
different and simpler in character to warrant focusing solely
on substitution mutations.
The training data set contained forward and reverse sequences
for each mixed sample. We considered each base call -- sequenced
in the forward and reverse direction -- together with associated
local trace data to be a single training element. In all, we had
approximately 100,000 separate training elements. Of these, 217
were mixed calls, while the others were pure calls. The pictures
below show examples of two training elements, one a 'mixed' and
the other a 'pure' basecall.
 |
 |
| Mixed |
Pure |
Methodology
Wagner Associates considered hundreds of numerical features
associated to each training element and evaluated them for their
ability to distinguish mixed basecalls from pure calls. Some features
were natural contenders, such as the ratio of the highest and
second highest traces at the basecall location. Others were suggested
by DNA analysts with years of experience in distinguishing true
mixture instances from pure basecalls with a large level of background
noise.
|
 |
 |
| 1:8 Mixture |
Background Noise |
With the most important features identified, we studied the
resulting feature space whose coordinates consist of the individual
features. We used statistical classification algorithms to partition
the feature space into subsets and assigned a probability (of
mixture) to each of the subsets in accordance with the historical
(training) data. Each probability (P) was then converted into
a log scaled integer score (S), via the relation
Results
The resulting model was tested by resubstituting the training
data into the scoring model. The following table presents the
resulting scores, the corresponding predicted proportion of mixture
calls for that score, and the observed proportion of mixture calls
for that score. The results show good agreement between predictions
and observations, especially given the small amount of data available
for training.
| Score |
Predicted Probability
of Mixture |
Observed Proportion
of Mixtures |
| 0 |
1.00000 |
0.44134 |
| 11 |
0.06723 |
0.03101 |
| 18 |
0.01569 |
0.04762 |
| 19 |
0.01100 |
0.03101 |
| 21 |
0.00751 |
0.01639 |
| 23 |
0.00500 |
0.00285 |
| 24 |
0.00414 |
0.00266 |
| 25 |
0.00311 |
0.00178 |
| 27 |
0.00201 |
0.01961 |
| 29 |
0.00107 |
0.00133 |
| 34 |
0.00040 |
0.00000 |
| 35 |
0.00028 |
0.00018 |
| 36 |
0.00024 |
0.00000 |
| 39 |
0.00013 |
0.00003 |
Contact
Us
We are actively seeking clients, commercialization partners,
and sources of additional training data for our work on mixture
detection.
Please contact atqa@pa.wagner.com
for further information. Go here
for other contact options.
|