A very important activity for our research regards the collection and annotation of acoustic and speech corpora. During the past 20 years we created a very large number of data sets. In the following, you can find some information about the facilities we have at our disposal for high-quality recording purposes, and a list of the most representative corpora we realized. Some of them are publicly available for free or distributed by external agencies, as specified in the following related pages.

In order to train speech recognition components for distant-speech interaction, we also have a background in the creation of simulated (also referred as "contaminated") data that represent recordings in noisy and reverberant environments, in a quite realistic way. With this regard, you can find more information under the DIRHA project pages as well as in related pages of this site.