The corpus contains a set of simulated acoustic scenes of duration 60 seconds (at 16kHz sampling frequency and 16-bit accuracy) which are observed by 40 microphone channels distributed over 5 different rooms of a real apartment. Each acoustic scene is composed of both speech (i.e., short English commands) and non-speech acoustic sources (i.e., typically home noises such as radio, TV, appliances, knocking, ringing, creaking and many others). The commands are based on the Grid corpus.