You are here


A very important activity for our research regards the collection and annotation of acoustic and speech corpora. During the past 20 years we created a very large number of data sets. In the following, you can find some information about the facilities we have at our disposal for high-quality recording purposes, and a list of the most representative corpora we realized. Some of them are publicly available for free or distributed by external agencies, as specified in the following related pages.

In order to train speech recognition components for distant-speech interaction, we also have a background in the creation of simulated (also referred as "contaminated") data that represent recordings in noisy and reverberant environments, in a quite realistic way. With this regard, you can find more information under the DIRHA project pages as well as in related pages of this site.

The phonetically-rich part of the DIRHA English Dataset is a multi-microphone acoustic corpus being developed under the EC project Distant-speech Interaction for Robust Home Applications ( The corpus is composed of real phonetically-rich sentences recorded with 32 sample-synchronized microphones in a domestic environment.

For access to this data set, after 2020 please refer to the instructions reported in

The SHINE research unit has his own audio room where to conduct recordings and experiments.

The DIRHA II Simulated Corpus is a multi-microphone, multi-room and multi-language database generated in the context of the DIRHA project.

The overall corpus, which is now available in 4 different languages (Italian, German, Portuguese and Greek), includes 675 acoustic sequences of duration 60 seconds observed by 40 microphones distributed over 5 different rooms (living-room, kitchen, bedroom, bathroom and corridor) of a real apartment available under the DIRHA project.

The sampling rate is 48kHz.

The corpus includes acoustic sequences of duration 60 seconds, at a sampling rate of 48 kHz for a subset of 21 microphones (13 in the living-room and 8 in the kitchen).

The recordings include spoken commands mixed with other acoustic events occurring in different rooms of a real apartment. The corpus contains a set of simulated acoustic scenes of duration 60 seconds which are observed by 40 microphone channels distributed over 5 different rooms of a real apartment. Each acoustic scene is composed of short English commands and non-speech acoustic sources.

For access to this data set, after 2020 please refer to the instructions reported in

FBK-irst has conducted Wizard of Oz experiments with the aim of collecting useful data for testing signal processing algorithms in the scenario foreseen by the DICIT project.

The Acoustic Event Detection (AED) task aims to detect and classify noisy events that can occur in an environment. Correct classification is important for all the CHIL services and for the automatic seminars transcription. This identification is particularly useful in the Connector service scenario since it is not possible to rely on the visual analysis.

This database consists of a set of sentences reproduced by a loudspeaker with different orientations and positions. It was recorded to analyze and evaluate algorithms for talker’s position and orientation estimation.
The use of a loudspeaker guarantees a precise reference for both position and orientation.

In the CHIL project, ITC-irst collected some seminars for being used as training material. Interactive meetings were composed by 3-4 persons sitting around a table a wearing a close-talk microphone. There was a strong interaction between the presenter and the audience, with numerous questions. In addiction, there was always a coffee break scheduled and a significant number of acoustic events was generated.

In the CHIL project, ITC-irst collected some seminars for being used as training material. Seminars were regular presentations held from a speaker and followed by a group of listeners. Presenters used a projector and a white screen, while listeners sit in the room and had the possibility to interrupt the speaker, using their microphones.

Scripted meetings (SM) are short real meetings with the duration of less than 10 minutes. Their aim is to collect an amount of various acoustic events in a compressed slot of time. In these meetings the vocal part has been put in background and the major attention given to the typical noisy events.

The nine languages covered by the SpeechDat-Car (SDC) project are: Danish, English, Finnish, Flemish/Dutch, French, German, Greek, Italian, Spanish.
Each database collected is composed by 300 speakers. Each speakers is recorded in two environments, each session is made by 125 items.

APASCI is an Italian speech database recorded in an insulated room with a Sennheiser MKH 416 T microphone. It includes 5,290 phonetically rich sentences and 10,800 isolated digits, for a total of 58,924 word occurrences (2,191 different words) and 641 minutes of speech.


SPK is an Italian speech database of isolated and connected digits. It was designed and collected at the Istituto per la Ricerca Scientifica e Tecnologica (ITC/IRST), Trento, Italy. SPK was conceived for speaker recognition and verification purposes.With this CD-ROM, speech material corresponding to isolated digits acquired from 100 speakers (30 females and 70 males, from 23 to 50 years old) is released. Most of the speakers are from the North-East of Italy.