You are here

DIRHA II Simulated Corpus

The DIRHA II Simulated Corpus is a multi-microphone, multi-room and multi-language database generated in the context of the DIRHA project.

The overall corpus, which is now available in 4 different languages (Italian, German, Portuguese and Greek), includes 675 acoustic sequences of duration 60 seconds observed by 40 microphones distributed over 5 different rooms (living-room, kitchen, bedroom, bathroom and corridor) of a real apartment available under the DIRHA project.

The sampling rate is 48kHz.

Each sequence consists of real background noise (actually recorded in the target apartment) with superimposed various localized acoustic events (Speech and Noises).

Acoustic events occur randomly (and rather uniformly) in time and in space (within the predefined positions) with various amplification gains.

The effect of propagation to each single microphone is accounted for by means of convolution of the dry/close-talk signals with the respective IR. Sometimes overlapping in time between different sources occurs.

Acoustic events can be divided into two categories, i.e. speech and noises.

Speecs events consists of excerpts from collections of sounds, and represent typical acoustic event within a home environment (e.g. appliances, knocks, ringing, squeaking, and many other sounds). A total of 327 different noise events can occur in the simulations.

For each acoustic events, an isolated version (“isolated” means that we extract the particular acoustic events deleting all the other sources occurring in the simulation ) for each channel is also present.

All the occurrences are completely documented and fully annotated in specific text files, which now also report information like reverberation time (T60), SNR, speaker's gender, speaker ID, acoustic segmentation.

Additional information describing the overall geometric configuration and a pictorial representation of the geometry of the acoustic scene complete the documentation of the corpus.

Data were generated by means of a multi-microphone simulation tool (MMSS) implemented in Matlab.

Due to the impressive realism, the huge amount of microphone and positions, this corpora can be suitable for:

  • Distant-talking Speech Recognition

  • Acoustic Localization

  • Multi-microphone signal processing

  • Acoustic Echo Cancellation (AEC)

  • Blind and Semi Blind Source Separation

  • Speaker ID/Speaker verification

  • Acoustic Event Detection and Classification

  • Speech/non-speech discrimination

Several tests has demonstrated the suitability of the corpus for experiments under the DIRHA project.

We plan to extend the corpus by including more languages such as English.

DOWNLOAD:

  • You can download a short example form here.
  • You can download 6 full-sequences (40 channels) from here.

MAIN PAPER:

  1. L. Cristoforetti, M. Ravanelli, M. Omologo, A. Sosi, A. Abad, M. Hagmueller, P. Maragos, "The DIRHA simulated corpus", in Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), 2014, pp. 2629-2634.

RELATED PAPERS:

  1. A. Brutti, M. Ravanelli, P. Svaizer, M. Omologo, "A speech event detection and localization task for multiroom environments", in Proceedings of the 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA),2014, pp. 157-161, Nancy, France, 12-14 May 2014.
  2. M. Ravanelli, A. Sosi, P. Svaizer, M. Omologo, "Impulse response estimation for robust speech recognition in a reverberant environment". In Proceedings of EUSIPCO 2012.

Contact us: 
mravanelli@fbk.eu