Maurizio Omologo

  • Phone: +39 0461314563
  • FBK Povo
Short bio

Maurizio Omologo obtained the Degree in Electrical Engineering at University of Padova (Italy) in 1984. From 1984 to 1987, he was researcher on speech coding at CSELT (Torino, Italy). In 1988, he joined ITC-irst (now FBK-irst), where he is the head of SHINE (Speech-acoustic scene analysis and interpretation) research unit (

Currently, main research areas of interest are: microphone arrays, speaker localization, speech analysis, source separation, speech enhancement, speaker identification, acoustic event detection, ASR for distant-speech interaction, audio signal processing for music information retrieval.

From 2001 to 2014, he taught "Audio Signal Processing and Coding" at University of Trento.

He acted, from 2006 to 2009, as Project Manager of the DICIT European Project and, from 2012 to 2014,  as Project Manager of the DIRHA European Project.

He is author of about 200 papers in major international conferences and journals in the field, and of 3 international patents.

Research interests
digital signal processing speech coding music signal processing multi-microphone systems speaker localization acoustic event detection speech enhancement source separation acoustic feature extraction distant-speech recognition deep learning audio and speech databases
Organization of scientific international events

▪ Special Session Chairman of European Signal Processing Conference - EUSIPCO Conference (Kos, Greece) 2017.

▪ Technical Area Chairman of IEEE Automatic Speech Recognition and Understanding – ASRU Workshop (Scottsdale, USA) 2015.

▪ Technical Area Chairman of Analysis of Speech, Audio Signals, Speech Coding, Speech Enhancement - ISCA Interspeech Conference (Lyon, France), August 2013.

▪ Tutorial Chairman - ISCA Interspeech Conference (Florence, Italy), August 2011.

▪ Technical Area Chairman of ASR Robustness and adaptation - ISCA Interspeech Conference (Florence, Italy), August 2011.

▪ Local Chairman - IEEE Automatic Speech Recognition and Understanding – ASRU Workshop (Merano, Italy), December 2009.

▪ General CoChairman - IEEE Hands-free Speech Communication and Microphone Arrays - HSCMA Workshop (Trento, Italy), May 2008.

▪ Technical Area Chairman of Audio and Electroacoustics – European Signal Processing Conference - EUSIPCO Conference (Lausanne, Switzerland), August 2008.

▪ Demo Chairman – International Conference on Multimodal Interfaces - ICMI (Trento, Italy), October 2005.

▪ General CoChairman - IEEE Automatic Speech Recognition and Understanding – ASRU Workshop (Madonna di Campiglio, Italy), December 2001.

Editorial activities

▪ Editorial Board Member of Acoustics, since 2017.

▪ Associate Editor of IEEE Transactions on Speech and Audio Processing, from 2003 to 2005.

▪ Guest Editor of a special Issue on Speech Processing for Natural Interaction with Intelligent Environments, IEEE Journal on Selected Topics in Signal Processing" – 2010.

▪ Editor of Language Resources and Evaluation journal – Springer - from 2012 to 2016.

Other activities

▪ Member of IEEE James L. Flanagan Speech and Audio Processing Award Committee, since 2017.

▪ Elected member of IEEE SPS Speech-Language Technical Committee, from 2014 to 2016.

▪ Member of the steering committee of Associazione Italiana Scienze Vocali (AISV) from 2006 to 2009.

▪ Member of IEEE SPS Speech Technical Committee, from 2003 to 2005.



▪ Ravanelli M., Brakel P., Omologo M., Bengio Y. (2017), Light Gated Recurrent Units for speech recognition, in «IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE», accepted for publication.

▪ Guerrero C., Tryfou G., Omologo M. (2017), Cepstral distance based channel selection for distant speech recognition, in «COMPUTER SPEECH AND LANGUAGE», accepted for publication.

▪ Fakhry M., Svaizer P., Omologo M. (2017), Audio Source Separation in Reverberant Environments Using β -Divergence-Based Nonnegative Factorization, in «IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING», vol. 25, n. 7, 2017, pp. 1462 - 1476.

▪ Khadkevich M., Omologo M. (2013). Reassigned spectrum-based feature extraction for GMM-based automatic chord recognition, in «EURASIP JOURNAL ON AUDIO, SPEECH AND MUSIC PROCESSING», 2013, pp. 1 – 12.

▪ A. Brutti, M. Omologo, P. Svaizer (2013) An environment aware ML estimation of acoustic radiation pattern with distributed microphone pairs, in «SIGNAL PROCESSING», vol. 93,  n. 4,  2013, pp. 784 -796.

▪ Nesta F, Omologo M (2012). Generalized State Coherence Transform for Multidimensional TDOA Estimation of Multiple Sources . IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, vol. 20, p. 246-260, ISSN: 1558-7916.

▪ F. Nesta, P. Svaizer, M. Omologo (2011). Convolutive BSS of short mixtures by ICA recursively regularized across frequencies. IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, vol. 19, p. 624- 639, ISSN: 1558-7916, doi: 10.1109/TASL.2010.2053027.

▪ A. Brutti, L. Cristoforetti, W. Kellermann, L. Marquardt, M. Omologo (2010). WOZ acoustic data collection for interactive TV. LANGUAGE RESOURCES AND EVALUATION, vol. 44, p. 205-219, ISSN:1574-020X, doi: DOI 10.1007/s10579-010-9116-x.

▪ A. Brutti, M. Omologo, P. Svaizer (2010). Multiple Source Localization Based on Acoustic Map De-Emphasis. EURASIP JOURNAL ON AUDIO, SPEECH, AND MUSIC PROCESSING, vol. 2010, ISSN:1687-4714, doi: 10.1155/2010/147495.

▪ Matassoni M, Omologo M, Giuliani D Svaizer P (2002). Hidden Markov model training with contaminated speech material for distant-talking speech recognition. COMPUTER SPEECH AND LANGUAGE, vol. 16, p. 205-223, ISSN: 0885-2308.

▪ Van Den Heuvel H, Boves L, Moreno A, Omologo M, Richard G, Sanders E (2001). Annotation in the SpeechDat Projects. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, vol. 4, p. 127-143, ISSN:1381-2416.

▪ Omologo M, Svaizer P, Matassoni M (1998). Environmental conditions and acoustic transduction in hands-free speech recognition . SPEECH COMMUNICATION, vol. 25, p. 75-95, ISSN: 0167-6393.

▪ Omologo M, Svaizer P (1997). Use of the crosspower-spectrum phase in acoustic event location. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, vol. 5, p. 288-292, ISSN: 1063-6676.

▪ Brugnara F, Falavigna D, Omologo M (1993). Automatic segmentation and labeling of speech based on Hidden Markov Models . SPEECH COMMUNICATION, vol. 12, p. 357-370, ISSN: 0167-6393.

Recent book chapters and conference papers

▪ Pertilä P.,  Brutti A., Svaizer P., Omologo M., Multichannel source activity detection, localization, and tracking  - Chapter 4  of the Book:  Vincent, E., Virtanen, T., & Gannot, S. (Eds.). (2017). Audio Source Separation and Speech Enhancement. Wiley.

▪ Tryfou G., Omologo M.,  A reassigned front-end for speech recognition. Proc. of EUSIPCO 2017.

▪ Ravanelli M., Brakel. P., Omologo M., Bengio Y.,  Improving speech recognition by revising gated recurrent units Proc. of Interspeech 2017.

▪ Ravanelli M., Brakel P., Omologo M., Bengio Y., A network of deep neural networks for distant speech recognition, ICASSP 2017, BEST IBM STUDENT PAPER AWARD.

▪ Fakhry M., Svaizer P., Omologo M., Estimation of the spatial information in Gaussian model based audio source separation using weighted spectral bases, Proceedings of EUSIPCO 2016.

▪ Ravanelli M., Svaizer P., Omologo M., Realistic Multi-Microphone Data Simulation for Distant Speech Recognition, Proceedings of INTERSPEECH 2016.

▪ Guerrero C., Tryfou G., Omologo, M., Channel Selection for Distant Speech Recognition Exploiting Cepstral Distance, Proceedings of INTERSPEECH 2016, pp. 1986-1990.

▪ Ravanelli M., Brakel P., Omologo M., Bengio Y., Batch-normalized joint training for DNN-based distant speech recognition, IEEE Workshop on Spoken Language Technology 2016. 

▪ M. Fakhry, P. Svaizer, M. Omologo. Audio source separation usinga redundant library of source spectral bases for nonnegative tensor factorization, Proceedings of ICASSP 2015, pp. 251-255.

▪ E. Zwyssig, M. Ravanelli, P. Svaizer, M. Omologo. A multi-channel corpus for distant-speech interaction in presence of known interferences, Proceedings of ICASSP 2015, pp. 4480-4484.

▪ M. Ravanelli, M. Omologo. Contaminated speech training methods for robust DNN-HMM distant speech recognition, Proceedings of INTERSPEECH 2015, 756-760.

▪ M. Ravanelli, L. Cristoforetti, R. Gretter, M. Pellin, A. Sosi, M. Omologo. The DIRHA-English corpus and related tasks for distant-speech recognition in domestic environments, Proceedings of IEEE-ASRU), 2015, pp. 275-282.

▪ S. Jalalvand, D. Falavigna, M. Matassoni, P. Svaizer, M. Omologo. Boosted acoustic model learning and hypotheses rescoring on the CHiME-3 task, , Proceedings of IEEE-ASRU), 2015, pp. 409-415.

▪ M. Ravanelli; M. Omologo, On the selection of the impulse responses for distant-speech recognition based on contaminated speech training, Proceedings of INTERSPEECH 2014, pp. 1028-1032.

▪ L. Cristoforetti, M. Ravanelli, M. Omologo, A. Sosi, A. Abad, M. Hagmueller, P. Maragos, The DIRHA simulated corpus, Proceedings LREC 2014, pp. 2629-2634.

▪ A. Brutti, M. Ravanelli, P. Svaizer, M. Omologo. A speech event detection and localization task for multiroom environments, Proceedings of  4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA), 2014, pp. 157-161.

▪ C. Guerrero, M. Omologo. Word boundary agreement to combine multi-microphone hypotheses in distant speech recognition, Proceedings of HSCMA, 2014.

▪ C. Guerrero, M. Omologo. Exploiting inter-microphone agreement for hypothesis combination in distant speech recognition, Proceedings of EUSIPCO, 2014, pp 2385-2389.

▪ G. Tryfou, M. Pellin, M. Omologo. Time-Frequency Reassigned Cepstral Coefficients for Phone-Level Speech Segmentation, Proceedings of  EUSIPCO, 2014, pp.2060-2064.

▪ M. Khadkevich, M. Omologo. Large scale cover song identification based on chord profiles. In: Proc. of 14th International Society for Music Information Retrieval Conference, November 2013.

▪ A. Brutti, M. Omologo. Geometric contamination for GMM/UBM speaker verification in reverberant environments. In: Proc. of Interspeech, 2013, Lyon, France, pp. 791 - 794.

▪ A.W. Mohammed, M. Matassoni, H.K. Maganti, M. Omologo. Semi-Blind Model Adaptation using Piece-wise Energy Decay Curve for Large Reverberant Environments. In: Proc. of Interspeech, 2012, Portland, USA.

▪ P. Svaizer, A. Brutti, M. Omologo. Environment-aware estimation of the orientation of acoustic sources using a line array. In: EUSIPCO 2012,  Bucharest, Romania, pp.1024-1028.

▪ A.W. Mohammed, M. Matassoni, H.K. Maganti, M. Omologo. Acoustic Model Adaptation Using Piece-wise Energy Decay Curve for Reverberant Environments. In: EUSIPCO 2012,  Bucharest, Romania, pp. 365-369.

▪ F. Nesta, M. Omologo. Convolutive underdetermined source separation through weighted interleaved ICA and spatio-temporal source correlation. In: LVA/ICA'12 - 10th international conference on Latent Variable Analysis and Signal Separation, Tel Aviv, Israel, pp. 222 - 230.

▪ F. Nesta, M. Omologo. Enhanced multidimensional spatial functions for unambiguous localization of multiple sparse acoustic sources. In: ICASSP 2012, Kyoto, Japan, pp. 213-216

▪ A. Brutti, M. Omologo, P. Svaizer. Maximum a posteriori trajectory estimation for acoustic tracking. In: IWAENC 2012, Aachen, Germany,

▪ M. Ravanelli, A. Sosi, P. Svaizer, M.Omologo. Impulse response estimation for robust speech recognition in a reverberant environment,  In: EUSIPCO 2012,  Bucharest – Romania, pp.1668-1672

▪ F. Nesta, M. Omologo. Approximated kernel density estimation for multiple TDOA detection. In: ICASSP 2011, Prague, Czech Republic.

▪ P. Svaizer, A. Brutti, M. Omologo. Use of reflected wavefronts for acoustic source localization with a line array. In: HSCMA 2011, Edinburgh, Scotland.

▪ M. Matassoni, H. K. Maganti, M. Omologo. Non-linear Spectro-temporal Modulations for Reverberant Speech Recognition,  In: HSCMA 2011, Edinburgh, Scotland,  pp. 115 -120.

▪ A. Brutti, M. Omologo, P. Svaizer. Inference of acoustic source directivity using environment awareness. In: EUSIPCO 2011, Barcelona, Spain, pp. 151 – 155.

▪ Khadkevich M, Omologo M. Time frequency reassigned features for automatic chord recognition. In: ICASSP, 2011. p. 181-184.

Other selected conference papers and documents

▪ A.Temko, C. Nadeu, D. Macho, R. Malkin, C. Zieger, M. Omologo. Acoustic event detection and classification. In: Computers in the Human Interaction Loop, Springer London, pp. 61-73.

▪ A. Brutti, M. Omologo, P. Svaizer. Comparison between different sound source localization techniques based on a real data collection, In: HSCMA 2008, pp.69-72.

▪ Temko A, Malkin R, Zieger C, Macho D, Omologo M. CLEAR evaluation of acoustic event detection and classification systems. In: Multimodal Technologies for Perception of Humans. LECTURE NOTES IN COMPUTER SCIENCE, vol. LNCS 4122, p. 311-322, ISSN: 0302-9743

▪ A. Brutti, M. Omologo, P. Svaizer. Oriented Global coherence field for the estimation of the head orientation in smart rooms equipped with distributed microphone arrays. In: -. Interspeech 2005. Lisbon, Portugal, p. 2337-2340.

▪ Giuliani D, Matassoni M, Omologo M, Svaizer P. Training of HMM with filtered speech material for hands-free recognition . In: ICASSP 1999. vol. 1, p. 449-45.

▪ Omologo M, Svaizer P, De Mori R. Chapter 2 - Acoustic transduction. In: Spoken Dialogues with Computers. p. 23-67, London:Academic Press, 1998.

▪ Svaizer P, Matassoni M, Omologo M. Acoustic source location in a three-dimensional space using crosspower spectrum phase . In: ICASSP 1997, vol. 1, p. 231-234.

▪ Omologo M, Matassoni M, Svaizer P, Giuliani D. Microphone array based speech recognition with different talker-array positions . In: ICASSP 1997, vol. 1, p. 227-230.

▪ Omologo M, Svaizer P. Acoustic source location in noisy and reverberant environment using CSP analysis . In: ICASSP 1996, vol. 2, p. 921-924.

▪ Omologo M, Svaizer P. Acoustic event localization using a crosspower-spectrum phase based technique . In: ICASSP 1994, vol. 2, p. II-273-II-276.

▪ Angelini B, Brugnara F, Falavigna D, Giuliani D, Gretter R, Omologo M. Speaker independent continuous speech recognition using an acoustic-phonetic italian corpus. In: ICSLP 1994. p. 1391-1394.

▪ Giuliani D, Omologo M, Svaizer P. Talker Localization and Speech Recognition using a Microphone Array and a Cross-PowerSpectrum Phase Analysis. In: Proceedings of International Conference on Spoken Language Processing, ICSLP- 94. p. 1243-1246.

▪ Cosi P., Falavigna D., M. Omologo A preliminary statistical evaluation of manual and automatic segmentation discrepancies., Eurospeech 1991.

▪ M. Omologo, P. Svaizer Use of the cross-power-spectrum phase in acoustic event localization, ITC-irst Technical Report 9303-13, submitted to IEEE Transactions on Speech and Audio Processing in 1993. Rejected in 1996, with the invitation to remove the portion concerning Global Coherence Field and to convert it to Correspondence.