You are here

Multi-source localization and tracking


The goal of the localization and tracking task is to identify the spatial position of active acoustic soruces in an enclosure, using audio signals captured by a multi-microphone front-end. The main challenges for the task are

  • the environmental noise,
  • the revereberation,
  • the presence of concurrent sources.

State of the Art

The problem is typically addressed considering the Time Difference of Arrival (TDOA) at one or more microphone pairs. As show in the figure below, the TDOA is related to the point of sound emission. The loci of points satisfying a given TDOA is a hyperoboloid.


When multiple microphone pairs are available, the position of sound emission can be estimated by combining the TDOA observations, for instance by simple triangulation.
The TDOA analysis is typically performed using the Generalized CrossCorrelation-PHAse Transform (GCC-PHAT), also known as CSP. For any possible time delay, GCC-PHAT produces a measure of the coherence of the signals captured by the microphones, relying on the phase information only.

SHINE approach

The SHINE approach to this task is based on the use of distributed microphone networks. Considering the linear combination of the coherence observed by a set of microphone pairs for a given set of coordinates (x,y,z), and repeating the same process for a grid of spatial points, a 3D space acoustic map, called the Global Coherence Field (GCF), is obtained. The GCF can be interpreted as an "acoustic generation map" that provides, for each position in a given room, the plausibility of the presence of an emitting acoustic source at that point. The position of the source can be estimated by picking the peak of the acoustic map. The figure below shows an example of acoustic map based on 7 microphone pairs in a gray scale: bright colors correspond to high values of the map, while dark areas represents low map values.


Head Orientation Estimation

In a multi-channel scenario with a distributed microphone network, head pose knowledge plays an important role. For example, given a rather accurate estimation of the head orientation, one can define an optimal subset of microphones for distant-talking ASR or sound source localization purposes. We have developed an extension of the GCF maps, called Oriented GCF (OGCF) for jointly estimating the source position and orientation. Basically, given that reverebration and background noise will predominate at microphone pairs locate behind the source, resulting in lower coherence, our approach compares the GCC-PHAT computed at different microphone pairs to derive an estimation of the source orientation [1,2]. 

Visit our demo page for more details and to see the algorithm working:

Multiple sources

In case of multiple sources, hardly ever multiple peaks are present in the acoustic maps and some of them may be related to ghost sources. However, it is still possible to obtain information related to the multiple sources by applying a proper manipulation of the acoustic maps which, based on a de-emphasis of GCC-PHAT function, gives rise to weak peaks [4]. Bayesian frameworks, like particle filtering, ensure temporal and spatial continuity to speaker trajectories [3].

In our demo page a video clip is available where the SHINE approch tracks up to 3 simultaneous source:

New Trends

Enviroment Aware localization
Recently, improved localization strategies were introduced trying to turn reverberation and image sources from issues to assets. More details here.

BSS based localization
Finally, new localization strategies have been addressed applying techniques from the blind source separation framework, in particular when several acoustic sources are active simultaneously. Basically, the GCC-PHAT function is replaced with the more advanced GSCT, in combination with high-order norms, while tracking is performed in TDOA domain [5]. The estimated TDOAs are in turn used to drive a following BSS module.
Have a look at the video-clip for an example:

  1. A. Brutti, M. Omologo, P.G. Svaizer, "Oriented global coherence field for the estimation of the head orientation in smart rooms equipped with distributed microphone arrays". Eurospeech 2005, Lisboa.
  2. A. Brutti, M. Omologo, P.G. Svaizer, "Speaker Localization based on Oriented Global Coherence Field", Interspeech 2006, September 17-21, 2006, Pittsburgh, Pennsylvania, USA.
  3. A. Brutti, M. Omologo, P. Svaizer, "A Sequential Monte Carlo Approach for Tracking of Overlapping Acoustic Sources", EUSIPCO, August 2009, Glasgow.
  4. A. Brutti, M. Omologo and P. Svaizer, Multiple Source Localization based on Acoustic Map De-Emphasis, EURASIP, Journal on Audio, Speech, and Music Processing
  5. A. Brutti, F. Nesta, "Tracking of multidimensional TDOA for multiple sources with distributed microphone pairs," Computers, Speech and Languages, 2012