You are here

CHIL Acoustic Event Detection

Introduction

The Acoustic Event Detection (AED) task aims to detect and classify noisy events that can occur in an environment. Correct classification is important for all the CHIL services and for the automatic seminars transcription. This identification is particularly useful in the Connector service scenario since it is not possible to rely on the visual analysis.
Acoustic events differ from scenario to scenario: in this database ITC-irst focuses on events that can happen in small environments, like lecture and small-meeting rooms. We recorded isolated acoustic events; this means that no interfering noises were present in the room.

Description of the acoustic events

The database contains 16 semantic classes of events:

  • Door knock
  • Door open
  • Door slam
  • Steps
  • Chair moving
  • Cough
  • Paper wrapping
  • Falling object
  • Laugh
  • Keyboard clicking
  • Key jingle
  • Spoon, cup jingle
  • Phone ring
  • Phone vibration
  • MIMIO pen buzz
  • Applause

Description of the recording setup

The acoustic events were reproduced in our CHIL-room and recorded with 32 microphones. They were mounted in 7 T-shaped arrays (composed by 4 microphones each one) and then there were 4 table microphones. The CHIL-room picture with microphones setup can be seen below.
For each experiment, 4 positions in the room were located. Each person had to change the position after every session. Exact position of everyone can be seen in the pictures at the bottom of the page.
Data were recorded at 44.1kHz and 16-bit precision. All the channels were synchronized through a clock on BNC connector. Signal format is RAW Little Endian.

Description of the recording procedure

9 people participated at the recordings. We recorded 3 experiments in different days, each one composed by 4 sessions and executed by 4 persons. During each session, every person reproduced a complete set of acoustic events. After every session, people swapped their positions.
A script was given to each participant, containing the order of the event to be executed.
In the whole database each event was repeated 48 times, in 12 different positions (except the common Applause, repeated 12 times).
Every event was preceded and followed by some silence in almost all the occurences. This helped the annotation procedure.

Description of the database

Data recorded consist in 3 experiments, each one divided in 4 sessions. To have an adequate coverage for training and development purposes, we decided to select the first, second and third session of each experiment as training material and to leave the three remaining fourth sessions as test material. We assume that AED technology will eventually be evalutated on real (or scripted) meetings.
In total there are 3 DVDs containing 3 sessions each one. In every session there are 32 audio files + one text file containing the segmentation.
For this moment we intend to use the development set for preliminary test purposes.

Annotation of the database

The annotation of the signals was done by analysing a single channel, the specific channel was Table-1. Annotation was done in a semi-automatic way, running an End Point Detector and then correcting markers manually. The name of the annotation file is always Table-1.seg.
An extract of an annotation file is the following:

	9728899 10051900 keyboard_clicking
	10228900 10615899 phone_ring

The two numeric values are the starting and ending sample of the segment in which the event occurs. The last element is the label of event detected.