You are here
A speech recognition system typically uses an acoustic model and a language model in the decoding phase.
The acoustic model provides a statistical representation for the acoustic features extracted from the speech sounds (usually phones). The most popular framework is based on Hidden Makov Models.
For distant-talking speech recognition it is crucial to train acoustic models where the possible acoustic mismatch between training and testing conditions is reduced and mitigated. An effective solution is based on contaminated speech: the target environment is simulated during training and a partial compensation of the reverberation and background effects is obtained.Recent activities are devoted to investigate methods for rapidly creating acoustic models properly designed for different rooms and for a variety of noisy conditions.
Attention has to be devoted also to counter-balance any effects introduced by the specific acoustic front-end, that, beside the desired reduction of noise or reverberation, can introduce unwanted and uncompensated distortion effects on the resulting acoustic features.
This motives the development of large simulated corpora, based on acoustic measurements in very different acoustic environments (e.g. household, office) and it represents a way to easily train acoustic models without the need of time-consuming real data collections.