Under two related Phase I SBIR efforts, one for the U.S. Air
Force and another for the U.S. Army Armament Research, Development,
and Engineering Center (ARDEC), Daniel H. Wagner Associates successfully
demonstrated the feasibility of using automated speech recognition
tools in high-noise environments. We developed a preliminary
designs for two prototype systems that emphasized modularity,
flexibility, and isolation of external interfaces. We developed
and analyzed several algorithms for noise mitigation that are
quite effective at reducing noise levels in recorded signals.
The noise mitigation techniques we used fall into two classes:
acoustic compensation, and filtering with respect to a variety
of Fourier and wavelet bases. Acoustic compensation simulates
or augments (digitally) the operation of a noise-canceling microphone,
using a separate sample of the noise field. Filtering selectively
de-emphasizes components of the data, with respect to a given
basis, based on real-time estimates of the signal and noise powers
in those components.
We evaluated the performance of our noise mitigation algorithms
with a variety of voice and noise samples. Performance was measured
by the accuracy of the speech recognition results from a COTS
speech recognition system. In our recorded noise experiments,
we used noise recordings provided by the sponsor and a third
party sound effects vendor. We took two separate, but not identical,
recordings of these noise sources and added speech samples, recorded
in a relatively noise-free background, to one of them. We also
performed some two-channel recording experiments that involved
recordings of individuals speaking in a noisy background.
For one of our noise mitigation algorithms in the recorded
noise experiments, the recognition rate was above 99% for
the sponsor supplied noise data (with a recognition rate
of 9% before any noise mitigation algorithms were applied to
the signals). When we included the additional noise sources,
the recognition rate was 90% (with a "before"
rate of 7.5%). For the two-channel recording experiments, the
results were mixed. For one noise source the recognition rate
was 52.5% for one of our algorithms (with a "before"
rate of 2.5%). When we included the other noise source, the
recognition rate was 36% for this filter (with a "before"
rate of 6%). The exceptional performance of our noise mitigation
algorithms demonstrate the feasibility of using voice driven
technologies in applications operating in high noise situations.
The Air force work supported aircraft maintenance and repair
activities. We achieved similar dramatic improvements in the
ARDEC research, where the environment of concern was inside an
Army tank.
The figures below illustrate the reduction in noise obtained
by our algorithms. Figure 1 contains a waveform plot of the sample
utterance "fix left wing" recorded in a relatively
noise-free environment. One can easily see when each word was
spoken.